+++ /dev/null
-APC Quick-Start Braindump
-
-This is a rapidly written braindump of how APC currently works in the
-form of a quick-start guide to start hacking on APC.
-
-1. Install and use APC a bit so you know what it does from the end-user's
- perspective.
- user-space functions are all explained here:
-
-2. Grab the current APC code from CVS:
-
- cvs -d:pserver:cvsread@cvs.php.net:/repository login
- Password: phpfi
- cvs -d:pserver:cvsread@cvs.php.net:/repository co pecl/apc
-
- apc/php_apc.c has most of the code for the user-visible stuff. It is
- also a regular PHP extension in the sense that there are MINIT, MINFO,
- MSHUTDOWN, RSHUTDOWN, etc. functions.
-
-3. Build it.
-
- cd pecl/apc
- phpize
- ./configure --enable-apc --enable-mmap
- make
- cp modules/apc.so /usr/local/lib/php
- apachectl restart
-
-4. Debugging Hints
-
- apachectl stop
- gdb /usr/bin/httpd
- break ??
- run -X
-
- Grab the .gdbinit from the PHP source tree and have a look at the macros.
-
-5. Look through apc/apc_sma.c
- It is a pretty standard memory allocator.
-
- apc_sma_malloc, apc_sma_realloc, apc_sma_strdup and apc_sma_free behave to the
- caller just like malloc, realloc, strdup and free
-
- On server startup the MINIT hook in php_apc.c calls apc_module_init() in
- apc_main.c which in turn calls apc_sma_init(). apc_sma_init calls into
- apc_mmap.c to mmap the specified sized segment (I tend to just use a single
- segment). apc_mmap.c should be self-explanatory. It mmaps a temp file and
- then unlinks that file right after the mmap to provide automatic shared memory
- cleanup in case the process dies.
-
- Once the region has been initialized we stick a header_t at the beginning
- of the region. It contains the total size in header->segsize and the number
- of bytes available in header->avail.
-
- After the header comes a bit of a hack. A zero-sized block is inserted just
- to make things easier later on. And then a huge block that is basically
- the size of the entire segment minus the two (for the 0-sized block, and this one)
- block headers.
-
- The code for this is:
-
- header = (header_t*) shmaddr;
- header->segsize = sma_segsize;
- header->avail = sma_segsize - sizeof(header_t) - sizeof(block_t) - alignword(sizeof(int));
- memset(&header->lock,0,sizeof(header->lock));
- sma_lock = &header->lock;
- block = BLOCKAT(sizeof(header_t));
- block->size = 0;
- block->next = sizeof(header_t) + sizeof(block_t);
- block = BLOCKAT(block->next);
- block->size = header->avail;
- block->next = 0;
-
- So the shared memory looks like this:
-
- +--------+-------+---------------------------------+
- | header | block | block |
- +--------+-------+---------------------------------+
-
- sma_shmaddrs[0] gives you the address of header
-
- The blocks are just a simple offset-based linked list (so no pointers):
-
- typedef struct block_t block_t;
- struct block_t {
- size_t size; /* size of this block */
- size_t next; /* offset in segment of next free block */
- size_t canary; /* canary to check for memory overwrites */
-#ifdef __APC_SMA_DEBUG__
- int id; /* identifier for the memory block */
-#endif
- };
-
- The BLOCKAT macro turns an offset into an actual address for you:
-
- #define BLOCKAT(offset) ((block_t*)((char *)shmaddr + offset))
-
- where shmaddr = sma_shaddrs[0]
-
- And the OFFSET macro goes the other way:
-
- #define OFFSET(block) ((int)(((char*)block) - (char*)shmaddr))
-
- Allocating a block with a call to apc_sma_allocate() walks through the
- linked list of blocks until it finds one that is >= to the requested size.
- The first call to apc_sma_allocate() will hit the second block. We then
- chop up that block so it looks like this:
-
- +--------+-------+-------+-------------------------+
- | header | block | block | block |
- +--------+-------+-------+-------------------------+
-
- Then we unlink that block from the linked list so it won't show up
- as an available block on the next allocate. So we actually have:
-
- +--------+-------+ +-------------------------+
- | header | block |------>| block |
- +--------+-------+ +-------------------------+
-
- And header->avail along with block->size of the remaining large
- block are updated accordingly. The arrow there representing the
- link which now points to a block with an offset further along in
- the segment.
-
- When the block is freed using apc_sma_deallocate() the steps are
- basically just reversed. The block is put back and then the deallocate
- code looks at the block before and after to see if the block immediately
- before and after are free and if so the blocks are combined. So you never
- have 2 free blocks next to each other, apart from at the front with that
- 0-sized dummy block. This mostly prevents fragmentation. I have been
- toying with the idea of always allocating block at 2^n boundaries to make
- it more likely that they will be re-used to cut down on fragmentation further.
- That's what the POWER_OF_TWO_BLOCKSIZE you see in apc_sma.c is all about.
-
- Of course, anytime we fiddle with our shared memory segment we lock using
- the locking macros, LOCK() and UNLOCK().
-
- That should mostly take care of the low-level shared memory handling.
-
-6. Next up is apc_main.c and apc_cache.c which implement the meat of the
- cache logic.
-
- The apc_main.c file mostly calls functions in apc_sma.c to allocate memory
- and apc_cache.c for actual cache manipulation.
-
- After the shared memory segment is created and the caches are initialized,
- apc_module_init() installs the my_compile_file() function overriding Zend's
- version. I'll talk about my_compile_file() and the rest of apc_compile.c
- in the next section. For now I will stick with apc_main.c and apc_cache.c
- and talk about the actual caches. A cache consists of a block of shared
- memory returned by apc_sma_allocate() via apc_sma_malloc(). You will
- notice references to apc_emalloc(). apc_emalloc() is just a thin wrapper
- around PHP's own emalloc() function which allocates per-process memory from
- PHP's pool-based memory allocator. Don't confuse apc_emalloc() and
- apc_sma_malloc() as the first is per-process and the second is shared memory.
-
- The cache is stored in/described by this struct allocated locally using
- emalloc():
-
- struct apc_cache_t {
- void* shmaddr; /* process (local) address of shared cache */
- header_t* header; /* cache header (stored in SHM) */
- slot_t** slots; /* array of cache slots (stored in SHM) */
- int num_slots; /* number of slots in cache */
- int gc_ttl; /* maximum time on GC list for a slot */
- int ttl; /* if slot is needed and entry's access time is older than this ttl, remove it */
- };
-
- Whenever you see functions that take a 'cache' argument, this is what they
- take. And apc_cache_create() returns a pointer to this populated struct.
-
- At the beginning of the cache we have a header. Remember, we are down a level now
- from the sma stuff. The sma stuff is the low-level shared-memory allocator which
- has its own header which is completely separate and invisible to apc_cache.c.
- As far as apc_cache.c is concerned the block of memory it is working with could
- have come from a call to malloc().
-
- The header looks like this:
-
- typedef struct header_t header_t;
- struct header_t {
- int num_hits; /* total successful hits in cache */
- int num_misses; /* total unsuccessful hits in cache */
- slot_t* deleted_list; /* linked list of to-be-deleted slots */
- };
-
- Since this is at the start of the shared memory segment, these values are accessible
- across all the yapache processes and hence access to them has to be locked.
-
- After the header we have an array of slots. The number of slots is user-defined
- through the apc.num_slots ini hint. Each slot is described by:
-
- typedef struct slot_t slot_t;
- struct slot_t {
- apc_cache_key_t key; /* slot key */
- apc_cache_entry_t* value; /* slot value */
- slot_t* next; /* next slot in linked list */
- int num_hits; /* number of hits to this bucket */
- time_t creation_time; /* time slot was initialized */
- time_t deletion_time; /* time slot was removed from cache */
- time_t access_time; /* time slot was last accessed */
- };
-
- The slot_t *next there is a linked list to other slots that happened to hash to the
- same array position.
-
- apc_cache_insert() shows what happens on a new cache insert.
-
- slot = &cache->slots[hash(key) % cache->num_slots];
-
- cache->slots is our array of slots in the segment. hash() is simply:
-
- static unsigned int hash(apc_cache_key_t key)
- {
- return key.data.file.device + key.data.file.inode;
- }
-
- That is, we use the file's device and inode to uniquely identify it. Initially
- we had used the file's full path, but getting that requires a realpath() call which
- is amazingly expensive since it has to stat each component of the path to resolve
- symlinks and get rid of relative path components. By using the device+inode we
- can uniquely identify a file with a single stat.
-
- So, on an insert we find the array position in the slots array by hasing the device+inode.
- If there are currently no other slots there, we just create the slot and stick it into
- the array:
-
- *slot = make_slot(key, value, *slot, t)
-
- If there are other slots already at this position we walk the link list to get to
- the end. Here is the loop:
-
- while (*slot) {
- if (key_equals((*slot)->key.data.file, key.data.file)) {
- /* If existing slot for the same device+inode is different, remove it and insert the new version */
- if ((*slot)->key.mtime != key.mtime) {
- remove_slot(cache, slot);
- break;
- }
- UNLOCK(cache);
- return 0;
- } else if(cache->ttl && (*slot)->access_time < (t - cache->ttl)) {
- remove_slot(cache, slot);
- continue;
- }
- slot = &(*slot)->next;
- }
-
- That first key_equals() check sees if we have an exact match meaning the file
- is already in the cache. Since we try to find the file in the cache before doing
- an insert, this will generally only happen if another process managed to beat us
- to inserting it. If we have a newer version of the file at this point we remove
- it an insert the new version. If our version is not newer we just return without
- doing anything.
-
- While walking the linked list we also check to see if the cache has a TTL defined.
- If while walking the linked list we see a slot that has expired, we remove it
- since we are right there looking at it. This is the only place we remove stale
- entries unless the shared memory segment fills up and we force a full expunge via
- apc_cache_expunge(). apc_cache_expunge() walks the entire slots array and walks
- down every linked list removing stale slots to free up room. This is obviously
- slow and thus only happens when we have run out of room.
-
- apc_cache_find() simply hashes and returns the entry if it is there. If it is there
- but older than the mtime in the entry we are looking for, we delete the one that is
- there and return indicating we didn't find it.
-
- Next we need to understand what an actual cache entry looks like. Have a look at
- apc_cache.h for the structs. I sort of glossed over the key part earlier saying
- that we just used the device+inode to find a hash slot. It is actually a bit more
- complex than that because we have two kinds of caches. We have the standard file
- cache containing opcode arrays, but we also have a user-controlled cache that the
- user can insert whatever they want into via apc_store(). For the user cache we
- obviously don't have a device+inode. The actual identifier is provided by the user
- as a char *. So the key is actually a union that looks like this:
-
- typedef union _apc_cache_key_data_t {
- struct {
- int device; /* the filesystem device */
- int inode; /* the filesystem inode */
- } file;
- struct {
- char *identifier;
- } user;
- } apc_cache_key_data_t;
-
- struct apc_cache_key_t {
- apc_cache_key_data_t data;
- int mtime; /* the mtime of this cached entry */
- };
-
- And we have two sets of functions to do inserts and finds. apc_cache_user_find()
- and apc_cache_user_insert() operate on the user cache.
-
- Ok, on to the actual cache entry. Again, because we have two kinds of caches, we
- also have the corresponding two kinds of cache entries described by this union:
-
- typedef union _apc_cache_entry_value_t {
- struct {
- char *filename; /* absolute path to source file */
- zend_op_array* op_array; /* op_array allocated in shared memory */
- apc_function_t* functions; /* array of apc_function_t's */
- apc_class_t* classes; /* array of apc_class_t's */
- } file;
- struct {
- char *info;
- zval *val;
- unsigned int ttl;
- } user;
- } apc_cache_entry_value_t;
-
- And then the actual cache entry:
-
- struct apc_cache_entry_t {
- apc_cache_entry_value_t data;
- unsigned char type;
- int ref_count;
- };
-
- The user entry is pretty simple and not all that important for now. I will
- concentrate on the file entries since that is what holds the actual compiled
- opcode arrays along with the functions and classes required by the executor.
-
- apc_cache_make_file_entry() in apc_cache.c shows how an entry is constructed.
- The main thing to understand here is that we need more than just the opcode
- array, we also need the functions and classes created by the compiler when it
- created the opcode array. As far as the executor is concerned, it doesn't know
- that it isn't operating in normal mode being called right after the parse/compile
- phase, so we need to recreate everything so it looks exactly like it would at
- that point.
-
-7. my_compile_file() and apc_compile.c
-
- my_compile_file() in apc_main.c controls where we get the opcodes from. If
- the user-specified filters exclude the file from being cached, then we just
- call the original compile function and return. Otherwise we fetch the request
- time from Apache to avoid an extra syscall, create the key so we can look up
- the file in the cache. If we find it we stick it on a local stack which we
- use at cleanup time to make sure we return everything back to normal after a
- request and call cached_compile() which installs the functions and classes
- associated with the op_array in this entry and then copy the op_array down
- into our memory space for execution.
-
- If we didn't find the file in the cache, we need to compile it and insert it.
- To compile it we simply call the original compile function:
-
- op_array = old_compile_file(h, type TSRMLS_CC);
-
- To do the insert we need to copy the functions, classes and the opcode array
- the compile phase created into shared memory. This all happens in apc_compile.c
- in the apc_copy_op_array(), apc_copy_new_functions() and apc_copy_new_classes()
- functions. Then we make the file entry and do the insert. Both of these
- operations were described in the previous section.
-
-8. The Optimizer
-
- The optimizer has been deprecated.
-
-If you made it to the end of this, you should have a pretty good idea of where things are in
-the code. I skimmed over a lot of things, so plan on spending some time reading through the code.
-