$config['site']['logperf'] = true; // to record & dump total hits of each type and the runtime to syslog
$config['site']['logperf_detail'] = true; // very verbose -- dump the individual cache keys and queries as they get used (may contain private info in some queries)
Seeing 180 cache gets on a timeline page seems not unusual currently; since these run in serial, even relatively small roundtrip times can add up heavily.
We should consider ways to reduce the number of round trips, such as more frequently storing compound objects or the output of processing in memcached.
Doing parallel multi-key lookups could also help by collapsing round-trip times, but might not be easy to fit into SN's object model. (For things like streams this should actually work pretty well -- grab the list, then when it's returned go grab all the individual items in parallel and return the list)
If a cache entry is dependent on the code that's running, upgrading
(or enabling/disabling plugins) can generate hard-to-track
inconsistencies.
This change adds a close-to-unique fingerprint of the running code to
some cache keys, so that if the fingerprint changes, the old values
are ignored and new values are used.
If the automated uniqueness fails, an administrator can add an extra
config value, $config['site']['build'], that's thrown into the key also.
- Multiplexing queues into groups and for multiple sites.
- Sharing vs breakout configurable per site and per queue via $config['queue']['breakout']
- Detect how many times a message is redelivered, discard if it's killed too many daemons
- count configurable with $config['queue']['max_retries']
- can dump the items to files in $config['queue']['dead_letter_dir']
Queue daemon memory & resource leak fixes:
- avoid unnecessary reconnections to memcached server (switch persistent connections back in on second initialization, assuming it's child process)
- monkey-patch for leaky .ini loads in DB_DataObject::databaseStructure() - was leaking 200k per active switch
- applied leak fixes to Status_network as well, using intermediate base Safe_DataObject for both it and Memcache_DataObject
Misc queue fixes:
- correct handling of child processes exiting due to signal termination instead of regular exit
- shutdown instead of infinite respawn loop if we're already past the soft memory limit at startup
- Added --all option for xmppdaemon... still opens one xmpp connection per site that has xmpp active
Cache updates:
- add Cache::increment() method with native support for memcached atomic increment
Key changes:
* Initialization code moved from common.php to StatusNet class;
can now switch configurations during runtime.
* As a consequence, configuration files must now be idempotent...
Be careful with constant, function or class definitions.
* Control structure for daemons/QueueManager/QueueHandler has been refactored;
the run loop is now managed by IoMaster run via scripts/queuedaemon.php
IoManager subclasses are woken to handle socket input or polling, and may
cover multiple sites.
* Plugins can implement notice queue handlers more easily by registering a
QueueHandler class; no more need to add a daemon.
The new QueueDaemon runs from scripts/queuedaemon.php:
* This replaces most of the old *handler.php scripts; they've been refactored
to the bare handler classes.
* Spawns multiple child processes to spread load; defaults to CPU count on
Linux and Mac OS X systems, or override with --threads=N
* When multithreaded, child processes are automatically respawned on failure.
* Threads gracefully shut down and restart when passing a soft memory limit
(defaults to 90% of memory_limit), limiting damage from memory leaks.
* Support for UDP-based monitoring: http://www.gitorious.org/snqmon
Rough control flow diagram:
QueueDaemon -> IoMaster -> IoManager
QueueManager [listen or poll] -> QueueHandler
XmppManager [ping & keepalive]
XmppConfirmManager [poll updates]
Todo:
* Respawning features not currently available running single-threaded.
* When running single-site, configuration changes aren't picked up.
* New sites or config changes affecting queue subscriptions are not yet
handled without a daemon restart.
* SNMP monitoring output to integrate with general tools (nagios, ganglia)
* Convert XMPP confirmation message sends to use stomp queue instead of polling
* Convert xmppdaemon.php to IoManager?
* Convert Twitter status, friends import polling daemons to IoManager
* Clean up some error reporting and failure modes
* May need to adjust queue priorities for best perf in backlog/flood cases
Detailed code history available in my daemon-work branch:
http://www.gitorious.org/~brion/statusnet/brion-fixes/commits/daemon-work
* We now cache negative lookups; clear them in Memcached_DataObject->insert()
* Mark file.url as a unique key in statusnet.ini so its negative lookups are cleared properly (first save of a notice with a new URL was failing due to double-insert)
* Now using serialization for default in-process cache instead of just saving objects; avoids potential corruption if you save an object to cache, change the original object, then fetch the same key from cache again