1 APC Quick-Start Braindump
3 This is a rapidly written braindump of how APC currently works in the
4 form of a quick-start guide to start hacking on APC.
6 1. Install and use APC a bit so you know what it does from the end-user's
8 user-space functions are all explained here:
10 2. Grab the current APC code from CVS:
12 cvs -d:pserver:cvsread@cvs.php.net:/repository login
14 cvs -d:pserver:cvsread@cvs.php.net:/repository co pecl/apc
16 apc/php_apc.c has most of the code for the user-visible stuff. It is
17 also a regular PHP extension in the sense that there are MINIT, MINFO,
18 MSHUTDOWN, RSHUTDOWN, etc. functions.
24 ./configure --enable-apc --enable-mmap
26 cp modules/apc.so /usr/local/lib/php
36 Grab the .gdbinit from the PHP source tree and have a look at the macros.
38 5. Look through apc/apc_sma.c
39 It is a pretty standard memory allocator.
41 apc_sma_malloc, apc_sma_realloc, apc_sma_strdup and apc_sma_free behave to the
42 caller just like malloc, realloc, strdup and free
44 On server startup the MINIT hook in php_apc.c calls apc_module_init() in
45 apc_main.c which in turn calls apc_sma_init(). apc_sma_init calls into
46 apc_mmap.c to mmap the specified sized segment (I tend to just use a single
47 segment). apc_mmap.c should be self-explanatory. It mmaps a temp file and
48 then unlinks that file right after the mmap to provide automatic shared memory
49 cleanup in case the process dies.
51 Once the region has been initialized we stick a header_t at the beginning
52 of the region. It contains the total size in header->segsize and the number
53 of bytes available in header->avail.
55 After the header comes a bit of a hack. A zero-sized block is inserted just
56 to make things easier later on. And then a huge block that is basically
57 the size of the entire segment minus the two (for the 0-sized block, and this one)
62 header = (header_t*) shmaddr;
63 header->segsize = sma_segsize;
64 header->avail = sma_segsize - sizeof(header_t) - sizeof(block_t) - alignword(sizeof(int));
65 memset(&header->lock,0,sizeof(header->lock));
66 sma_lock = &header->lock;
67 block = BLOCKAT(sizeof(header_t));
69 block->next = sizeof(header_t) + sizeof(block_t);
70 block = BLOCKAT(block->next);
71 block->size = header->avail;
74 So the shared memory looks like this:
76 +--------+-------+---------------------------------+
77 | header | block | block |
78 +--------+-------+---------------------------------+
80 sma_shmaddrs[0] gives you the address of header
82 The blocks are just a simple offset-based linked list (so no pointers):
84 typedef struct block_t block_t;
86 size_t size; /* size of this block */
87 size_t next; /* offset in segment of next free block */
88 size_t canary; /* canary to check for memory overwrites */
89 #ifdef __APC_SMA_DEBUG__
90 int id; /* identifier for the memory block */
94 The BLOCKAT macro turns an offset into an actual address for you:
96 #define BLOCKAT(offset) ((block_t*)((char *)shmaddr + offset))
98 where shmaddr = sma_shaddrs[0]
100 And the OFFSET macro goes the other way:
102 #define OFFSET(block) ((int)(((char*)block) - (char*)shmaddr))
104 Allocating a block with a call to apc_sma_allocate() walks through the
105 linked list of blocks until it finds one that is >= to the requested size.
106 The first call to apc_sma_allocate() will hit the second block. We then
107 chop up that block so it looks like this:
109 +--------+-------+-------+-------------------------+
110 | header | block | block | block |
111 +--------+-------+-------+-------------------------+
113 Then we unlink that block from the linked list so it won't show up
114 as an available block on the next allocate. So we actually have:
116 +--------+-------+ +-------------------------+
117 | header | block |------>| block |
118 +--------+-------+ +-------------------------+
120 And header->avail along with block->size of the remaining large
121 block are updated accordingly. The arrow there representing the
122 link which now points to a block with an offset further along in
125 When the block is freed using apc_sma_deallocate() the steps are
126 basically just reversed. The block is put back and then the deallocate
127 code looks at the block before and after to see if the block immediately
128 before and after are free and if so the blocks are combined. So you never
129 have 2 free blocks next to each other, apart from at the front with that
130 0-sized dummy block. This mostly prevents fragmentation. I have been
131 toying with the idea of always allocating block at 2^n boundaries to make
132 it more likely that they will be re-used to cut down on fragmentation further.
133 That's what the POWER_OF_TWO_BLOCKSIZE you see in apc_sma.c is all about.
135 Of course, anytime we fiddle with our shared memory segment we lock using
136 the locking macros, LOCK() and UNLOCK().
138 That should mostly take care of the low-level shared memory handling.
140 6. Next up is apc_main.c and apc_cache.c which implement the meat of the
143 The apc_main.c file mostly calls functions in apc_sma.c to allocate memory
144 and apc_cache.c for actual cache manipulation.
146 After the shared memory segment is created and the caches are initialized,
147 apc_module_init() installs the my_compile_file() function overriding Zend's
148 version. I'll talk about my_compile_file() and the rest of apc_compile.c
149 in the next section. For now I will stick with apc_main.c and apc_cache.c
150 and talk about the actual caches. A cache consists of a block of shared
151 memory returned by apc_sma_allocate() via apc_sma_malloc(). You will
152 notice references to apc_emalloc(). apc_emalloc() is just a thin wrapper
153 around PHP's own emalloc() function which allocates per-process memory from
154 PHP's pool-based memory allocator. Don't confuse apc_emalloc() and
155 apc_sma_malloc() as the first is per-process and the second is shared memory.
157 The cache is stored in/described by this struct allocated locally using
161 void* shmaddr; /* process (local) address of shared cache */
162 header_t* header; /* cache header (stored in SHM) */
163 slot_t** slots; /* array of cache slots (stored in SHM) */
164 int num_slots; /* number of slots in cache */
165 int gc_ttl; /* maximum time on GC list for a slot */
166 int ttl; /* if slot is needed and entry's access time is older than this ttl, remove it */
169 Whenever you see functions that take a 'cache' argument, this is what they
170 take. And apc_cache_create() returns a pointer to this populated struct.
172 At the beginning of the cache we have a header. Remember, we are down a level now
173 from the sma stuff. The sma stuff is the low-level shared-memory allocator which
174 has its own header which is completely separate and invisible to apc_cache.c.
175 As far as apc_cache.c is concerned the block of memory it is working with could
176 have come from a call to malloc().
178 The header looks like this:
180 typedef struct header_t header_t;
182 int num_hits; /* total successful hits in cache */
183 int num_misses; /* total unsuccessful hits in cache */
184 slot_t* deleted_list; /* linked list of to-be-deleted slots */
187 Since this is at the start of the shared memory segment, these values are accessible
188 across all the yapache processes and hence access to them has to be locked.
190 After the header we have an array of slots. The number of slots is user-defined
191 through the apc.num_slots ini hint. Each slot is described by:
193 typedef struct slot_t slot_t;
195 apc_cache_key_t key; /* slot key */
196 apc_cache_entry_t* value; /* slot value */
197 slot_t* next; /* next slot in linked list */
198 int num_hits; /* number of hits to this bucket */
199 time_t creation_time; /* time slot was initialized */
200 time_t deletion_time; /* time slot was removed from cache */
201 time_t access_time; /* time slot was last accessed */
204 The slot_t *next there is a linked list to other slots that happened to hash to the
207 apc_cache_insert() shows what happens on a new cache insert.
209 slot = &cache->slots[hash(key) % cache->num_slots];
211 cache->slots is our array of slots in the segment. hash() is simply:
213 static unsigned int hash(apc_cache_key_t key)
215 return key.data.file.device + key.data.file.inode;
218 That is, we use the file's device and inode to uniquely identify it. Initially
219 we had used the file's full path, but getting that requires a realpath() call which
220 is amazingly expensive since it has to stat each component of the path to resolve
221 symlinks and get rid of relative path components. By using the device+inode we
222 can uniquely identify a file with a single stat.
224 So, on an insert we find the array position in the slots array by hasing the device+inode.
225 If there are currently no other slots there, we just create the slot and stick it into
228 *slot = make_slot(key, value, *slot, t)
230 If there are other slots already at this position we walk the link list to get to
231 the end. Here is the loop:
234 if (key_equals((*slot)->key.data.file, key.data.file)) {
235 /* If existing slot for the same device+inode is different, remove it and insert the new version */
236 if ((*slot)->key.mtime != key.mtime) {
237 remove_slot(cache, slot);
242 } else if(cache->ttl && (*slot)->access_time < (t - cache->ttl)) {
243 remove_slot(cache, slot);
246 slot = &(*slot)->next;
249 That first key_equals() check sees if we have an exact match meaning the file
250 is already in the cache. Since we try to find the file in the cache before doing
251 an insert, this will generally only happen if another process managed to beat us
252 to inserting it. If we have a newer version of the file at this point we remove
253 it an insert the new version. If our version is not newer we just return without
256 While walking the linked list we also check to see if the cache has a TTL defined.
257 If while walking the linked list we see a slot that has expired, we remove it
258 since we are right there looking at it. This is the only place we remove stale
259 entries unless the shared memory segment fills up and we force a full expunge via
260 apc_cache_expunge(). apc_cache_expunge() walks the entire slots array and walks
261 down every linked list removing stale slots to free up room. This is obviously
262 slow and thus only happens when we have run out of room.
264 apc_cache_find() simply hashes and returns the entry if it is there. If it is there
265 but older than the mtime in the entry we are looking for, we delete the one that is
266 there and return indicating we didn't find it.
268 Next we need to understand what an actual cache entry looks like. Have a look at
269 apc_cache.h for the structs. I sort of glossed over the key part earlier saying
270 that we just used the device+inode to find a hash slot. It is actually a bit more
271 complex than that because we have two kinds of caches. We have the standard file
272 cache containing opcode arrays, but we also have a user-controlled cache that the
273 user can insert whatever they want into via apc_store(). For the user cache we
274 obviously don't have a device+inode. The actual identifier is provided by the user
275 as a char *. So the key is actually a union that looks like this:
277 typedef union _apc_cache_key_data_t {
279 int device; /* the filesystem device */
280 int inode; /* the filesystem inode */
285 } apc_cache_key_data_t;
287 struct apc_cache_key_t {
288 apc_cache_key_data_t data;
289 int mtime; /* the mtime of this cached entry */
292 And we have two sets of functions to do inserts and finds. apc_cache_user_find()
293 and apc_cache_user_insert() operate on the user cache.
295 Ok, on to the actual cache entry. Again, because we have two kinds of caches, we
296 also have the corresponding two kinds of cache entries described by this union:
298 typedef union _apc_cache_entry_value_t {
300 char *filename; /* absolute path to source file */
301 zend_op_array* op_array; /* op_array allocated in shared memory */
302 apc_function_t* functions; /* array of apc_function_t's */
303 apc_class_t* classes; /* array of apc_class_t's */
310 } apc_cache_entry_value_t;
312 And then the actual cache entry:
314 struct apc_cache_entry_t {
315 apc_cache_entry_value_t data;
320 The user entry is pretty simple and not all that important for now. I will
321 concentrate on the file entries since that is what holds the actual compiled
322 opcode arrays along with the functions and classes required by the executor.
324 apc_cache_make_file_entry() in apc_cache.c shows how an entry is constructed.
325 The main thing to understand here is that we need more than just the opcode
326 array, we also need the functions and classes created by the compiler when it
327 created the opcode array. As far as the executor is concerned, it doesn't know
328 that it isn't operating in normal mode being called right after the parse/compile
329 phase, so we need to recreate everything so it looks exactly like it would at
332 7. my_compile_file() and apc_compile.c
334 my_compile_file() in apc_main.c controls where we get the opcodes from. If
335 the user-specified filters exclude the file from being cached, then we just
336 call the original compile function and return. Otherwise we fetch the request
337 time from Apache to avoid an extra syscall, create the key so we can look up
338 the file in the cache. If we find it we stick it on a local stack which we
339 use at cleanup time to make sure we return everything back to normal after a
340 request and call cached_compile() which installs the functions and classes
341 associated with the op_array in this entry and then copy the op_array down
342 into our memory space for execution.
344 If we didn't find the file in the cache, we need to compile it and insert it.
345 To compile it we simply call the original compile function:
347 op_array = old_compile_file(h, type TSRMLS_CC);
349 To do the insert we need to copy the functions, classes and the opcode array
350 the compile phase created into shared memory. This all happens in apc_compile.c
351 in the apc_copy_op_array(), apc_copy_new_functions() and apc_copy_new_classes()
352 functions. Then we make the file entry and do the insert. Both of these
353 operations were described in the previous section.
357 The optimizer has been deprecated.
359 If you made it to the end of this, you should have a pretty good idea of where things are in
360 the code. I skimmed over a lot of things, so plan on spending some time reading through the code.