Moved much of the writing that happens when posting a notice to a new
queuehandler, distribqueuehandler. This updates tags, groups, replies
and inboxes at queue time (or at Web time, if queues are disabled).
To make this work well, I had to break up the monolithic
Notice::blowCaches() and make cache blowing happen closer to where
data is updated.
Squashed commit of the following:
commit 5257626c62750ac4ac1db0ce2b71410c5711cfa3
Author: Evan Prodromou <evan@status.net>
Date: Mon Jan 25 14:56:41 2010 -0500
slightly better handling of blowing tag memory cache
commit 8a22a3cdf6ec28685da129a0313e7b2a0837c9ef
Author: Evan Prodromou <evan@status.net>
Date: Mon Jan 25 01:42:56 2010 -0500
change 'distribute' to 'distrib' so not too long for dbqueue
commit 7a063315b0f7fad27cb6fbd2bdd74e253af83e4f
Author: Evan Prodromou <evan@status.net>
Date: Mon Jan 25 01:39:15 2010 -0500
change handle_notice() to handle() in distributqueuehandler
commit 1a39ccd28b9994137d7bfd21bb4f230546938e77
Author: Evan Prodromou <evan@status.net>
Date: Mon Jan 25 16:05:25 2010 -0500
error with queuemanager
commit e6b3bb93f305cfd2de71a6340b8aa6fb890049b7
Author: Evan Prodromou <evan@status.net>
Date: Mon Jan 25 01:11:34 2010 -0500
Blow memcache at different point rather than one big function for Notice class
commit 94d557cdc016187d1d0647ae1794cd94d6fb8ac8
Author: Evan Prodromou <evan@status.net>
Date: Mon Jan 25 00:48:44 2010 -0500
Blow memcache at different point rather than one big function for Notice class
commit 1c781dd08c88a35dafc5c01230b4872fd6b95182
Author: Evan Prodromou <evan@status.net>
Date: Wed Jan 20 08:54:18 2010 -0500
move broadcasting and distributing to new queuehandler
commit da3e46d26b84e4f028f34a13fd2ee373e4c1b954
Author: Evan Prodromou <evan@status.net>
Date: Wed Jan 20 08:53:12 2010 -0500
Move distribution of notices to new distribute queue handler
I made a bad merge on Jan 10th from master to 0.9.x. This lost a number
of memcache enhancements made on the 0.9.x branch. I've been able to
re-do the manual merge, and this represents the changes. Most of them
are related to caching on insert.
Queue handlers for XMPP individual & firehose output now send their XML stanzas
to another output queue instead of connecting directly to the chat server. This
lets us have as many general processing threads as we need, while all actual
XMPP input and output go through a single daemon with a single connection open.
This avoids problems with multiple connected resources:
* multiple windows shown in some chat clients (psi, gajim, kopete)
* extra load on server
* incoming message delivery forwarding issues
Database changes:
* queue_item drops 'notice_id' in favor of a 'frame' blob.
This is based on Craig Andrews' work branch to generalize queues to take any
object, but conservatively leaving out the serialization for now.
Table updater (preserves any existing queued items) in db/rc3to09.sql
Code changes to watch out for:
* Queue handlers should now define a handle() method instead of handle_notice()
* QueueDaemon and XmppDaemon now share common i/o (IoMaster) and respawning
thread management (RespawningDaemon) infrastructure.
* The polling XmppConfirmManager has been dropped, as the message is queued
directly when saving IM settings.
* Enable $config['queue']['debug_memory'] to output current memory usage at
each run through the event loop to watch for memory leaks
To do:
* Adapt XMPP i/o to component connection mode for multi-site support.
* XMPP input can also be broken out to a queue, which would allow the actual
notice save etc to be handled by general queue threads.
* Make sure there are no problems with simply pushing serialized Notice objects
to queues.
* Find a way to improve interactive performance of the database-backed queue
handler; polling is pretty painful to XMPP.
* Possibly redo the way QueueHandlers are injected into a QueueManager. The
grouping used to split out the XMPP output queue is a bit awkward.
Conflicts:
scripts/xmppdaemon.php
Queue handlers for XMPP individual & firehose output now send their XML stanzas
to another output queue instead of connecting directly to the chat server. This
lets us have as many general processing threads as we need, while all actual
XMPP input and output go through a single daemon with a single connection open.
This avoids problems with multiple connected resources:
* multiple windows shown in some chat clients (psi, gajim, kopete)
* extra load on server
* incoming message delivery forwarding issues
Database changes:
* queue_item drops 'notice_id' in favor of a 'frame' blob.
This is based on Craig Andrews' work branch to generalize queues to take any
object, but conservatively leaving out the serialization for now.
Table updater (preserves any existing queued items) in db/rc3to09.sql
Code changes to watch out for:
* Queue handlers should now define a handle() method instead of handle_notice()
* QueueDaemon and XmppDaemon now share common i/o (IoMaster) and respawning
thread management (RespawningDaemon) infrastructure.
* The polling XmppConfirmManager has been dropped, as the message is queued
directly when saving IM settings.
* Enable $config['queue']['debug_memory'] to output current memory usage at
each run through the event loop to watch for memory leaks
To do:
* Adapt XMPP i/o to component connection mode for multi-site support.
* XMPP input can also be broken out to a queue, which would allow the actual
notice save etc to be handled by general queue threads.
* Make sure there are no problems with simply pushing serialized Notice objects
to queues.
* Find a way to improve interactive performance of the database-backed queue
handler; polling is pretty painful to XMPP.
* Possibly redo the way QueueHandlers are injected into a QueueManager. The
grouping used to split out the XMPP output queue is a bit awkward.
Queue handlers for XMPP individual & firehose output now send their XML stanzas
to another output queue instead of connecting directly to the chat server. This
lets us have as many general processing threads as we need, while all actual
XMPP input and output go through a single daemon with a single connection open.
This avoids problems with multiple connected resources:
* multiple windows shown in some chat clients (psi, gajim, kopete)
* extra load on server
* incoming message delivery forwarding issues
Database changes:
* queue_item drops 'notice_id' in favor of a 'frame' blob.
This is based on Craig Andrews' work branch to generalize queues to take any
object, but conservatively leaving out the serialization for now.
Table updater (preserves any existing queued items) in db/rc3to09.sql
Code changes to watch out for:
* Queue handlers should now define a handle() method instead of handle_notice()
* QueueDaemon and XmppDaemon now share common i/o (IoMaster) and respawning
thread management (RespawningDaemon) infrastructure.
* The polling XmppConfirmManager has been dropped, as the message is queued
directly when saving IM settings.
* Enable $config['queue']['debug_memory'] to output current memory usage at
each run through the event loop to watch for memory leaks
To do:
* Adapt XMPP i/o to component connection mode for multi-site support.
* XMPP input can also be broken out to a queue, which would allow the actual
notice save etc to be handled by general queue threads.
* Make sure there are no problems with simply pushing serialized Notice objects
to queues.
* Find a way to improve interactive performance of the database-backed queue
handler; polling is pretty painful to XMPP.
* Possibly redo the way QueueHandlers are injected into a QueueManager. The
grouping used to split out the XMPP output queue is a bit awkward.
- NOTICE_INBOX_SOURCE_* constants moved to common.php since Notice_inbox.php not always loaded
- fixed typo in User::staticGet() call which caused user #1 to receive messages once for each subscriber instead of for him/herself
- 'continue' -> 'continue 2' inside switch() statement to fix loop escape (PHP considers switch() a looping construct for break & continue)
Key changes:
* Initialization code moved from common.php to StatusNet class;
can now switch configurations during runtime.
* As a consequence, configuration files must now be idempotent...
Be careful with constant, function or class definitions.
* Control structure for daemons/QueueManager/QueueHandler has been refactored;
the run loop is now managed by IoMaster run via scripts/queuedaemon.php
IoManager subclasses are woken to handle socket input or polling, and may
cover multiple sites.
* Plugins can implement notice queue handlers more easily by registering a
QueueHandler class; no more need to add a daemon.
The new QueueDaemon runs from scripts/queuedaemon.php:
* This replaces most of the old *handler.php scripts; they've been refactored
to the bare handler classes.
* Spawns multiple child processes to spread load; defaults to CPU count on
Linux and Mac OS X systems, or override with --threads=N
* When multithreaded, child processes are automatically respawned on failure.
* Threads gracefully shut down and restart when passing a soft memory limit
(defaults to 90% of memory_limit), limiting damage from memory leaks.
* Support for UDP-based monitoring: http://www.gitorious.org/snqmon
Rough control flow diagram:
QueueDaemon -> IoMaster -> IoManager
QueueManager [listen or poll] -> QueueHandler
XmppManager [ping & keepalive]
XmppConfirmManager [poll updates]
Todo:
* Respawning features not currently available running single-threaded.
* When running single-site, configuration changes aren't picked up.
* New sites or config changes affecting queue subscriptions are not yet
handled without a daemon restart.
* SNMP monitoring output to integrate with general tools (nagios, ganglia)
* Convert XMPP confirmation message sends to use stomp queue instead of polling
* Convert xmppdaemon.php to IoManager?
* Convert Twitter status, friends import polling daemons to IoManager
* Clean up some error reporting and failure modes
* May need to adjust queue priorities for best perf in backlog/flood cases
Detailed code history available in my daemon-work branch:
http://www.gitorious.org/~brion/statusnet/brion-fixes/commits/daemon-work
(We need to keep it returning a reference because the extlib parent class is stuck in PHP 4-land and uses references everywhere, including this function's return value. Yuck!)
Also changed pkeyGet to drop the reference, since it doesn't have an upstream equivalent.
* added ProfileDeleteRelated event to match UserDeleteRelated, to allow plugins to add extra related tables on profile deletion
* UserFlagPlugin: deleting flags when target profile is deleted
* UserFlagPlugin: deleting flags when flagging user is deleted
* UserFlagPlugin: fix for autoloader -- class names are case-insensitive. We may get lowercase class names coming in at times, such as when creating DB objects programatically from a table name.
Note that any already-existing bogus entries need to be removed from the database:
select * from user_flag_profile where (select id from profile where id=profile_id) is null;
select * from user_flag_profile where (select id from user where id=user_id) is null;
In web interface and retweet/repeat API we show the original untrimmed text, but some back-compat API messages will still show the trimmed 'RT' version.
This matches Twitter's behavior on overlong retweets, though we're outputting the RT version in more API results than they do.
In web interface and retweet/repeat API we show the original untrimmed text, but some back-compat API messages will still show the trimmed 'RT' version.
This matches Twitter's behavior on overlong retweets, though we're outputting the RT version in more API results than they do.
* We now cache negative lookups; clear them in Memcached_DataObject->insert()
* Mark file.url as a unique key in statusnet.ini so its negative lookups are cleared properly (first save of a notice with a new URL was failing due to double-insert)
* Now using serialization for default in-process cache instead of just saving objects; avoids potential corruption if you save an object to cache, change the original object, then fetch the same key from cache again
There's great value in knowing that something doesn't exist. We
now cache this information, and carefully compare the results from
cache as $results !== false instead of !empty($results), since some
empty values (null, 0, empty array, empty string) are stored in the
cache.
Caching staticGet() and pkeyGet() now store DB misses in the cache,
and cachedQuery() checks for empty results from the cache.
There were some problems with the automated cache/uncache system
for data objects that made us cache unfindable keys (with null
attributes and sometimes null names). Fixed those problems and
refactored the encache() and decache() methods so they use a helper
to find the cache keys to use.
Moved the important parts of the location-argument-handling stuff
to a single function. Handles defaults and overrides correctly, and
easy to use. Changed Web and API channels to use it.
Moved the important parts of the location-argument-handling stuff
to a single function. Handles defaults and overrides correctly, and
easy to use. Changed Web and API channels to use it.
Sorting on notice.id when our primary selector was notice_inbox.user_id caused a filesort and table scan of the notice table.
Switchng to notice_inbox's notice_id means we can use our index, and everything comes right up.
Before:
mysql> explain SELECT notice.id AS id FROM notice JOIN notice_inbox ON notice.id = notice_inbox.notice_id WHERE notice_inbox.user_id = 18574 AND notice.repeat_of IS NULL ORDER BY notice.id DESC LIMIT 61 OFFSET 0;
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | notice_inbox | ref | PRIMARY,notice_inbox_notice_id_idx | PRIMARY | 4 | const | 102600 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | notice | eq_ref | PRIMARY | PRIMARY | 4 | stoica.notice_inbox.notice_id | 1 | Using index |
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+----------------------------------------------+
After:
mysql> explain SELECT notice.id AS id FROM notice JOIN notice_inbox ON notice.id = notice_inbox.notice_id WHERE notice_inbox.user_id = 18574 AND notice.repeat_of IS NULL ORDER BY notice_id DESC LIMIT 61 OFFSET 0;
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+--------------------------+
| 1 | SIMPLE | notice_inbox | ref | PRIMARY,notice_inbox_notice_id_idx | PRIMARY | 4 | const | 102816 | Using where; Using index |
| 1 | SIMPLE | notice | eq_ref | PRIMARY,notice_repeatof_idx | PRIMARY | 4 | stoica.notice_inbox.notice_id | 1 | Using where |
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+--------------------------+
Sorting on notice.id when our primary selector was notice_inbox.user_id caused a filesort and table scan of the notice table.
Switchng to notice_inbox's notice_id means we can use our index, and everything comes right up.
Before:
mysql> explain SELECT notice.id AS id FROM notice JOIN notice_inbox ON notice.id = notice_inbox.notice_id WHERE notice_inbox.user_id = 18574 AND notice.repeat_of IS NULL ORDER BY notice.id DESC LIMIT 61 OFFSET 0;
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | notice_inbox | ref | PRIMARY,notice_inbox_notice_id_idx | PRIMARY | 4 | const | 102600 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | notice | eq_ref | PRIMARY | PRIMARY | 4 | stoica.notice_inbox.notice_id | 1 | Using index |
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+----------------------------------------------+
After:
mysql> explain SELECT notice.id AS id FROM notice JOIN notice_inbox ON notice.id = notice_inbox.notice_id WHERE notice_inbox.user_id = 18574 AND notice.repeat_of IS NULL ORDER BY notice_id DESC LIMIT 61 OFFSET 0;
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+--------------------------+
| 1 | SIMPLE | notice_inbox | ref | PRIMARY,notice_inbox_notice_id_idx | PRIMARY | 4 | const | 102816 | Using where; Using index |
| 1 | SIMPLE | notice | eq_ref | PRIMARY,notice_repeatof_idx | PRIMARY | 4 | stoica.notice_inbox.notice_id | 1 | Using where |
+----+-------------+--------------+--------+------------------------------------+---------+---------+-------------------------------+--------+--------------------------+
self-subscription) via the API. Additionally, make it impossible
to block yourself or unsubscribe from yourself, period.
I also made User use the subs.php helper function for unsubscribing
during a block.
Hopefully, these changes will get rid of the problem of people
accidentally deleting their self-subscriptions once and for all
(knock on wood).
* master: (67 commits)
Ticket 2038: fix bad bug tracker link
Fix regression in group posting: bug introduced in commit 1319002e15. Need to use actual profile object rather than an id on a variable that doesn't exist when checking blocks :D
Log database errors when saving notice_inbox entries
Drop the username from the log id for now; seems to trigger an error loop in some circumstances
request id on logs... pid + random id per web request + username + method + url
Add OpenID ini info back into statusnet.ini as a stopgap until we can
Some changes to the OpenID DataObjects to make them emit the exact same
OpenID plugin should set 'user_openid.display' as unique key
Remove relationship: user_openid.user_id -> user.id. I don't think this
Have OpenID plugin DataObjects emit their own .ini info
Revert "Allow plugin DB_DataObject classes to not have to use the .ini file by overriding keys(), table(), and sequenceKey() for them"
Catch and report exceptions from notice_to_omb_notice() instead of letting the OMB queue handler die.
Fix regression in remote subscription; added hasRole() shadow method on Remote_profile.
Fix fatal error on OMB subscription for first-timers
Remove annoying log msg
Drop error message on setlocale() failure; this is harmless, since we actually have a working locale set up.
Catch uncaught exception
Fixed bug where reply-sync bit wasn't getting saved
Forgot to render the nav menu when on FB Connect login tab
Facebook plugin no longer takes over Login and Connect settings nav menus
...
Conflicts:
db/08to09_pg.sql
db/statusnet_pg.sql
locale/pt_BR/LC_MESSAGES/statusnet.mo
plugins/Mapstraction/MapstractionPlugin.php
DB_DataObject hides errors by silently returning null for any non-existent method call, making it harder to tell what the heck's going on... the rights check for blocked remote users returned null for the check for subscribe rights, thus eval'ing to false. We now log a note in this circumstance, which would have cut about 3 hours off of the debug time.
DB_DataObject hides errors by silently returning null for any non-existent method call, making it harder to tell what the heck's going on... the rights check for blocked remote users returned null for the check for subscribe rights, thus eval'ing to false. We now log a note in this circumstance, which would have cut about 3 hours off of the debug time.
Added a right for new notices, realized that the hasRight() method
should be on the profile, and moved it.
Makes this a less atomic commit but that's the way it goes sometimes.
Added EmailAuthenticationPlugin
Added ReverseUsernameAuthenticationPlugin
Changed the StartChangePassword and EndChangePassword events to take a user, instead of a nickname
User::allowed_nickname was declared non-static, but used as if it was static, so I made the declaration static
Upgrade notes:
* Index names have changed from hardcoded 'Identica_people' and 'Identica_notices' to use the database name and actual table names. Must reindex.
New events:
* GetSearchEngine to override default search engine class selection from plugins
New scripts:
* gen_config.php generates a sphinx.conf from database configuration (with theoretical support for status_network table, but it doesn't seem to be cleanly queriable right now without knowing the db setup info for that. Needs generalized support.)
* Replaced old sphinx-indexer.sh and sphinx-cron.sh with index_update.php
Other fixes:
* sphinx.conf.sample better matches our live config, skipping unused stopword list and using a more realistic indexer memory limit
Further notes:
* Probably doesn't work right with PostgreSQL yet; Sphinx can pull from PG but the extraction queries currently look like they use some MySQL-specific functions.
For various reasons, it's nicer to have a class for theme-file paths
and such. So, I've rewritten the code for determining the locations of
theme files to be more OOPy.
I changed all the uses of the two functions in the module (theme_file
and theme_path) to use Theme::file and Theme::path respectively.
I've also removed the code in common.php that require's the module;
using a class means we can autoload it instead.
The User_openid data object was explicitly listed as a related field to delete from in User::delete(); this class doesn't exist anymore by default since OpenID was broken out to a plugin.
Added UserDeleteRelated event for plugins to add related tables to delete from at user delete time.
Caching support will be added in future work after unit tests have been added.
* extlib: add PEAR HTTP_Request2 0.4.1 alpha
* extlib: update PEAR Net_URL2 to 0.3.0 beta for HTTP_Request2 compatibility
* moved direct usage of CURL and file_get_contents to HTTPClient class, excluding external-sourced libraries
* adapted GeonamesPlugin for new HTTPResponse interface
Note some plugins haven't been fully tested yet.
Caching support will be added in future work after unit tests have been added.
* extlib: add PEAR HTTP_Request2 0.4.1 alpha
* extlib: update PEAR Net_URL2 to 0.3.0 beta for HTTP_Request2 compatibility
* moved direct usage of CURL and file_get_contents to HTTPClient class, excluding external-sourced libraries
Note some plugins haven't been tested yet.
This reverts commit 15f9c80c28.
So, so, elegant! And so, so, incorrect!
We can't have a user named 'notice' because that would interfere with
URLs like /notice/1234. However, there is no file named 'notice' in
the Web root.
If there were a way to automatically pull out the virtual paths in the
root dir, this may make sense. Until then, we keep track here.
Canon urls that have a protocol followed by a host (and no path) automatcally get a trailing slash by the canon function - make the unit test match that