Memory efficient Python with bytearray

The story

I was developing an audio broadcasting server.  I wrote the server with Twisted.  It works fine, but there is still a big problem to solve: the memory usage.  My audio broadcasting server use memory so much.  You can see it from the following figure.

As you see, the memory usage goes up like crazy when there are more listeners on the line.  It is almost an exponential growth. And it doesn’t make sense why it takes so much memory.  Then I started to find the reason of memory overuse.  At first, I thought it might be some memory leaks from the C-modules of Twisted or other third-party packages.  But it is not reasonable, if it is memory leaking, why there is no relative raising of memory usage at all peaks of radios/listeners?

Surprising result from guppy

I inspected the memory usage detail with guppy, and get a surprising result:


Partition of a set of 116280 objects. Total size = 9552004 bytes.
 Index  Count   %     Size   % Cumulative  % Type
  0  52874  45  4505404  47   4505404  47 str
  1   5927   5  2231096  23   6736500  71 dict
  2  29215  25  1099676  12   7836176  82 tuple
  3   7503   6   510204   5   8346380  87 types.CodeType
  4   7625   7   427000   4   8773380  92 function
  5    672   1   292968   3   9066348  95 type
  6    866   1    82176   1   9148524  96 list
  7   1796   2    71840   1   9220364  97 __builtin__.weakref
  8   1140   1    41040   0   9261404  97 __builtin__.wrapper_descriptor
  9   2603   2    31236   0   9292640  97 int

Even the RSS reported by ps is almost 100MB, there is still little memory actually is used by Python objects.  I then considered another reason: the Fragmentation.

Fragmentation

I was not sure at first, I also didn’t do memory leak detecting on my server.  I am just guessing.  I reviewed the design of server.   Following class is the core class of the design.

class AudioStream(object):
    """Audio stream

    """

    def __init__(self, memoryLimit=128*1024):
        """

        @param memoryLimit: limit of memory usage in bytes
        """
        # limit of memory usage
        self.memoryLimit = memoryLimit
        # offset of audio stream (how many chunks)
        self.offset = 0
        # queue for audio data chunk
        self.chunks = []
        # total bytes of chunk in queue
        self.totalSize = 0

    def write(self, chunk):
        """Write audio data to audio stream

        @param chunk: audio data chunk to write
        """
        # append chunk to queue
        self.chunks.append(chunk)
        self.totalSize += len(chunk)

        # check the usage of memory, if exceeded, pop chunks
        while self.totalSize > self.memoryLimit:
            poppedSize = len(self.chunks.pop(0))
            self.totalSize -= poppedSize
            self.offset += 1
            log.debug('AudioStream pop chunk %d, remaining size %d, offset %d',
                poppedSize, self.totalSize, self.offset)

It is a simple idea: every radio has its own buffer: a list of string, and when a listener needs more data, just pick up a chunk and send it to peer.  A list of string?  That might be the reason of overuse of memory, I wondered.  And I tried to understand how Python manages memory.  I found the article Improving Python’s Memory Allocator.   Then I understand how it works.  It’s not difficult,  every object smaller than 256 bytes will be allocated in the memory pool,  otherwise it will be allocated with malloc.  Then I can imagine what happened behind the scene.  The chunks are not in fixed size,  their size might be 200~300 bytes to 1xxx bytes.  The list of chunks is limited in a fixed size,  when the size of total chunks is too big,  it just pops some old chunks out.  Here is the key point of the reason of fragmentation.  You can imagine:

  1. A chunk with 987 bytes occupies a piece of memory at 0x000123
  2. When it is no longer needed, it was freed, then there is a free memory chunk at 0x000123 with 987 bytes.
  3. Here comes another memory allocation request with size 768,  malloc founds the free chunk at 0x000123, it just occupies it and return
  4. Here is the fragment!  987 – 768 = 219, we got a small chunk of free memory, it is too small and difficult to be used

This thing happened again, again and again during the server is serving, then there will be more and more fragments !  I finally know the reason of overuse of memory.  Then here comes another question: how to fix it?  List of string is an easy way to store buffering chunks,  but it makes fragments.  I then wondered a Python C-module with memory chunk allocation and accessing function might be a good idea.

The final solution: bytearray

I started to write my memory chunk allocation C-module. But when I am doing it, I found there is a better solution just fits what I need: the bytearray.  I found it in the CPython source code.   It is a single memory chunk, you can change its size, it might be reallocated.  But if you don’t change its length, it is always same memory buffer.  At first, I was curious, why I can’t find anything for bytearray in Python2.6 documents? I then noticed it is a back-port from Python3, that’s why there is no document for it.  Okay, now let’s see the better solution with bytearray:

class AudioStream(object):
    """Audio stream

    """

    def __init__(self, size=1024, count=128):
        """

        The bytes is a big memory chunk, it buffers all incoming audio data.
        There are blocks in the memory chunk, they are the basic unit to send to
        peer.

        <-------------- Memory chunk ------------------>
        <--Block1--><--Block2--><--Block3--><--Block4-->
        ^          ^          ^          ^
        L1         L2         L3         L4

        We map blocks to the real audio stream

        <------------------ Audio Stream -------------->  ---> time goes
        <--Block3--><--Block4--><--Block1--><--Block2-->

                          Map to

        <-------------- Memory chunk ------------------>
        <--Block1--><--Block2--><--Block3--><--Block4-->

        Every listener got their offset of whole audio stream, so that we can
        know which block he got.

        ------------<------------------ Audio Stream --------------> --->
                    <--Block3--><--Block4--><--Block1--><--Block2-->
        ^
        L5

        When there is a listener point to a out of buffer window place, we
        should move the pointer to the first current block.

        ------------<------------------ Audio Stream --------------> --->
                    <--Block3--><--Block4--><--Block1--><--Block2-->
                    ^
                    L5

        @param size: size of block
        @param count: count of blocks
        """
        self._size = size
        self._count = count
        self._bufferSize = size*count

        # offset of begin of buffer window in audio stream
        self._offset = 0
        # bytes array
        self._bytes = bytearray(self.bufferSize)
        # small chunks, they are not big enough to fit a block
        self._pieces = []
        # total size of pieces
        self._pieceSize = 0

    def _getSize(self):
        return self._size
    size = property(_getSize)

    def _getCount(self):
        return self._count
    count = property(_getCount)

    def _getOffset(self):
        return self._offset
    offset = property(_getOffset)

    def _getBufferSize(self):
        return self._bufferSize
    bufferSize = property(_getBufferSize)

    def write(self, chunk):
        """Write audio data to audio stream

        @param chunk: audio data chunk to write
        """
        # append chunk to pieces
        self._pieces.append(chunk)
        self._pieceSize += len(chunk)

        while self._pieceSize >= self.size:
            total = ''.join(self._pieces)
            block = total[:self.size]
            # there is still some remain piece
            if self._pieceSize - self.size > 0:
                self._pieces = [total[self.size:]]
                self._pieceSize = len(self._pieces[0])
            else:
                self._pieces = []
                self._pieceSize = 0

            # write the block to buffer
            begin = self.offset % self.bufferSize
            oldSize = len(self._bytes)
            self._bytes[begin:begin+self.size] = block
            assert len(self._bytes) == oldSize, "buffer size is changed"

            self._offset += len(block)

    def read(self, offset):
        """Read a block from audio stream

        @param offset: offset to read block
        @return: (block, new offset)
        """
        begin = offset % self.bufferSize
        assert begin >= 0
        assert begin < self.bufferSize
        block = str(self._bytes[begin:begin+self.size])
        offset += self.size
        return block, offset


With new design, there are no more allocations and deallocations.  That saves a huge mount of memory.  Here is the memory usage figure with new design:

As you can see, we have almost 800 listeners on line, and the memory usage is still low and stable.  The original server uses over 100MB even with only 6x listeners on the line.  That’s huge difference.

This entry was posted in English Articles, Python, 分享 and tagged , , , . Bookmark the permalink.

One Response to Memory efficient Python with bytearray

  1. aaron says:

    调研要用到python网页抓取来到这里
    发现兄台也在搞android
    做的东西挺不错的,有机会多交流下