To begin, I would first like to answer the obvious question: Why would anyone want to use lua when already using python? The answer is simple, legacy code. This experiment arose from the evaluation of possible candidates to replace an existing c/c++ system with limited throughput in terms of request per second with something like python or erlang. In this post, we will be focussing on python, since it was an experiment borne out of my own curiosity. As for the usage of lua, let’s say there was a lot of business logic already written in it and the requirement is to reuse that code base. I chose to use LuaJIT, because it already proved to be fast enough for our requirements. So, the next obvious step was to make my python async code talk to lua. Of course, there is more than one way to accomplish that but what struck to me as an awesome option was to use Cython for this purpose, since it keeps a lot of my code still written in python and generates the necessary c code to communicate with luajit. We will go through step by step assessment of this combination to see what works best. The source code for this blog is here.
Requirements and set up
1. Python 3.5
2. Cython
3. ZMQ
4. LuaJIT
First create a virtual environment for this experiment using the following steps to keep the rest of your python setup safe from this experiment. Install virtual environment, if it is not already installed and then go ahead and create the virtualenv (replacing the path for python as per the system you are using) and activate the virtualenv.

$> apt get install virtualenv 
$> virtualenv venv -p /usr /bin/python3.5 
$> source venv/bin/activate

Once the virtualenv is installed, we will go ahead and install all the packages that we will require for this exercise.

apt-get install lua5.1 liblua5.1-dev luajit-5.1
pip install cython zmq aiozmq msgpack-python uvloop

Let’s setup LUA to do something, not necessarily interesting. The following lua code simply does some random calculations and return the random value.

 function getRandomValue() 
   print('Caling Random')
   x = math.random()
   x = math.floor(x * 10) 
   x = x + 1

print('Priming Run')
 address = ''
 port = 67

As can be seen, the line: print(‘Priming Run’) is there to show that the lua when loaded from c port, does the priming once. If you run this code using the following command, then it gives the output as shown below:

$ lua lua_code.lua
Priming Run

It is time to write some C code to run this lua file. Notice the main function at the end, which we will remove later to be able to call to this file from python.

/* lua_runner.c */
 #include <lauxlib.h>

/* Convenience stuff */
 static void close_state(lua_State **L) { lua_close(*L); }
 #define cleanup(x) __attribute__((cleanup(x)))
 #define auto_lclose cleanup(close_state)

static int on_recv(lua_State *L, char *buf, size_t len)
   lua_getglobal(L, "getRandomValue");
   lua_pushlstring(L, buf, len); /* Binary strings are okay */
   int ret = lua_pcall(L, 1, 1, 0); /* 1 argument, 1 result */
   printf("ret: %d, buflen: %ld\n", ret, lua_tointeger(L, -1));
   lua_pop(L, 1);
   return ret;

int myfunc()
   /* Create VM state */
   auto_lclose lua_State *L = luaL_newstate();
   if (!L)
     return 1;
   luaL_openlibs(L); /* Open standard libraries */
   /* Load config file */
   luaL_loadfile(L, "lua_code.lua"); /* (1) */
   int ret = lua_pcall(L, 0, 0, 0);
   if (ret != 0) {
     fprintf(stderr, "%s\n", lua_tostring(L, -1));
     return 1;
   /* Read out config */
   lua_getglobal(L, "address"); /* (2) */
   lua_getglobal(L, "port");
   printf("address: %s, port: %ld\n", /* (3) */
   lua_tostring(L, -2), lua_tointeger(L, -1));
   lua_settop(L, 0); /* (4) */
   char buf[512] = { 0x05, 'h', 'e', 'l', 'l', 'o' };
   return on_recv(L, buf, sizeof(buf));

void main()

To run this code, we need to first compile this code. Run the following command to compile this the lua_runner.c code, which will result in a.out file that can be run to see the output as shown below.

$ cc -w lua_runner.c -I/usr/include/luajit-2.0 -lluajit-5.1 -lm -ldl
$ ./a.out
Priming Run
address:, port: 67
Caling Random
ret: 0, buflen: 10

The lua script gets loaded by the first lua_pcall function call, which prints “Priming Run” output and then the rest of the code runs to by calling the getRandomValue function in the lua code. Up to this point, what we have done is to simply call from C code to lua. Now we will Call this lua method from Python using Cython.

Python -> Cython -> Lua Code

To start using cython, we must write python a little differently than we are used to. Cython allows python code to be converted to C code and be compiled as a loadable library. This library can be access from normal interpreted python (cpython) as if it was calling some other python module. There are a lot of benefits of using cython, which I would not be going in details of. However, to learn more about Cython, you can follow this link to its documentation. We need to start by removing the main function from the lua_runner.c file, creating a header file with declaration for myfunc() function and then compiling it to a library and writing the cython code as show below.

 /* lua_runner.h */
 #ifndef __MYCODE_H__
 #define __MYCODE_H__
 extern int myfunc();
 #endif //__MYCODE_H__

The lua_runner.c does not need the main function anymore, since we want to compile it to .o object file. The header file is used by the cython file (pyx) shown below to access our library for myfunc function.

# lua_runner_cython.pyx
 cdef extern from "lua_runner.h":
   cdef int myfunc()

def callCfunc():
  return myfunc()

Run the following commands to compile the cython file to c file and then all the c files to object files and then create a shared object file from the collection of object files.

$cython lua_runner_cython.pyx
$gcc -g -O2 -fpic -c lua_runner.c -o lua_runner.o -I/usr/include/luajit-2.0 -lluajit-5.1 -lm -ldl
$gcc -g -O2 -fpic -c lua_runner_cython.c -o lua_runner_cython.o `python-config --cflags`
$ gcc -shared -o lua_runner.o lua_runner_cython.o `python-config --libs` -I/usr/include/luajit-2.0 -lluajit-5.1 -lm -ldl
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lua_runner_cython import callCfunc
>>> callCfunc()
Priming Run
address:, port: 67
Caling Random
ret: 0, buflen: 5

As can be seen, we are able to import the lua_runner_cython module from the python repl and also able to see the results of the lua_code.lua file when callCfunc() is called. We have setup the python to lua calling. It is time to make a async server which will forward the calls to lua client, print these results and return a value. First, we will directly try to call the lua code from our async server and see how much performance we can pull out of such a system. I am currently running the code on a vagrant box and the resultant performance is not representation of true performance that can be achieved, nevertheless, we will be able to identify and map the difference in performance by running the varying approaches to this problem on the same machine, to get the idea of which approach is most appropriate for application such as this.
Next step in the line would most obviously be able to call this function from python server, where this call will represent a call to backend service. However, I would prefer to not go the route of calling this method from a threaded server, since it will require locking the call so only one thread can access it at a time. It is also a problem because the calling a c port call from thread on GIL would not gain anything in terms of concurrency. Therefore, it makes a lot of sense to me to use queueing between the threads that want to call the Cython function and the thread which will do the actual calling. If we can maintain a proper rpc style wait while calling, then we will not have to worry about locks in our code. Although, this approach removes the necessity for the developer to use any locking, it simply abstracts the locking behaviour inside the queue data-structure, which is not free from the problems present due to GIL. Therefore, we will separate the processes in to two, where 1 threaded server will make calls to the Cython function by putting messages on the queue and one or more worker processes will respond to the call. Since, the queue is involved, the call becomes async and therefore, it requires our special attention. We need to make sure that the returned messages on return queue are forwarded to appropriate thread waiting on response. We can accomplish that by maintaining a correlation id to keep track of request and response. However, in order to lookup the correct thread by correlation id, we will have to maintain a separate data-structure (dictionary) to manage that. Instead, we can simply use event queues, where the appropriate thread will be picked up by event poll as and when messages arrive. Let’s see how that works and if at all that works.
Python Async IO calls LUA from cython interface
The reasoning behind using AsyncIO is to show how easy it is to write a simple RPC based system and we are using ZMQ as the queueing system. The asyncio server is capable of handling my requests from client and it is preferable to not have lua code running in the same process as the server module. Also, it was also one of the things that I wanted to test, to see; if it is possible to have asyncio communicate via ZMQ and call Cython code. Since Cython is using threads a little differently than asyncio module, it made sense to keep them separate. Why ZMQ? The answer will be clear very shortly. Following is the code for the both client and server running from the same event loop. We could have created them separately in separate files, but it would have been inconsequential to the outcome of this exercise.

 import asyncio
 import zmq
 import aiozmq
 import uvloop
 import time
 import os
 import aiozmq.rpc

from lua_runner_cython import callCfunc

class ServerHandler(aiozmq.rpc.AttrHandler):

   def remote_func(self, a: int, b: int) -> int:
     return callCfunc()

  async def go():
     server = await aiozmq.rpc.serve_rpc(
     ServerHandler(), bind='tcp://*:*')
     server_addr = list(server.transport.bindings())[0]
     client = await aiozmq.rpc.connect_rpc(connect=server_addr)
     count = 0
     while True:
       ret = await, 2)
       if ret < 5:
         count += 1
         if count > 5:
     print("Returned from lua -> cythond -> asyncio:", ret)

   await server.wait_closed()
   await client.wait_closed()

def main():

if __name__ == '__main__':

As you can see, there is a single go() method, which is creating both client and server ZMQ sockets and then calling the remote function declared by the ServerHandler class, which extends AttrHandler from aiozmp library and is responsible for making a call to Cython code. The loop in which the call is being made to the remote function breaks as soon as the random number between the range 0 to 10 have upto 5 return values that are greater than 5. Once the loop terminates, we close both client as well as server socket and then we are done. On running this code you see the following output.

Returned from lua -> cythond -> asyncio: 9
Priming Run
address: (null), port: 0
Caling Random
Returned from lua -> cythond -> asyncio: 9
Priming Run
address: (null), port: 0
Caling Random
Returned from lua -> cythond -> asyncio: 9
Priming Run
address: (null), port: 0
Caling Random
Returned from lua -> cythond -> asyncio: 9
Priming Run
address: (null), port: 0
Caling Random
Returned from lua -> cythond -> asyncio: 9
Priming Run
address: (null), port: 0
Caling Random
Returned from lua -> cythond -> asyncio: 2
Priming Run
address: (null), port: 0
Caling Random
Returned from lua -> cythond -> asyncio: 2
Priming Run
address: (null), port: 0
Caling Random
Returned from lua -> cythond -> asyncio: 2
Priming Run
address: (null), port: 0
Caling Random
Returned from lua -> cythond -> asyncio: 2
Priming Run
address: (null), port: 0
Caling Random

But there is one problem that we have not yet handled. It is the fact that the lua code is being loaded every time the call is made to the Cython function. This is very bad for performance as well as it does not reuse the pointers that are already created for each run. You can also see that the priming run is being repeated, which means, the lua code file is being prepped on every function call. This needs to go and go it will in the follow up article to this one. For now, we will halt with the ability to have an asyncio based call calling cython, which inturn calls the lua code via luajit.
I hope the readers enjoyed this exercise. Either ways, please leave a comment with any suggestions, improvements and corrections that there may be.
Until next time.