We were passing DOM nodes directly into the queues for the final bookmark import stage; unfortunately these don't actually survive serialization.
Moved the extraction of properties from the HTML up to the first-stage handler, so now we don't have to worry about moving DOM nodes from one handler to the next. Instead passing an associative array of properties, which is fed into the Bookmark::saveNew by the per-bookmark handler.
delicious bookmark exports use the godawful HTML bookmark file format that ancient versions of Netscape used (and has thus been the common import/export format for bookmarks since the dark ages of the web :)
This arranges bookmark entries as an HTML definition list, using a lot of implied close tags (leaving off the </dt> and </dd>).
DOMDocument->loadHTML() uses libxml2's HTML mode, which generally does ok with muddling through things but apparently is really, really bad about handling those implied close tags.
Sequences of adjacent <dt> elements (eg bookmark without a description, followed by another bookmark "<dt><dt>"), end up interpreted as nested ("<dt><dt></dt></dt>") instead of as siblings ("<dt></dt><dt></dt>").
The first round of code tried to resolve the nesting inline, but ended up a bit funky in places.
I've replaced this with a standalone run through the data to re-order the elements, based on our knowing that <dt> and <dd> cannot directly contain one another; once that's done, our main logic loop can be a bit cleaner. I'm not 100% sure it's doing nested sublists correctly, but these don't seem to show up in delicious export (and even if they do, with the way we flatten the input it shouldn't make a difference).
Also fixed a clearer edge case where some bookmarks didn't get imported when missing descriptions.
I was trying to generate URIs for Bookmarks based on (profile, crc32(url), created).
I failed at that. CRC32s are unsigned ints, and our schema code didn't like that.
On top of that, my code to encode and restore created timestamps was problematic.
So, I switched back to using a meaningless unique ID for Bookmarks.
One way to do this would be to use an auto-incrementing integer ID. However, we've been
kind of crabbed out a few times for exposing auto-incrementing integer IDs as URIs, so
I thought maybe using a random UUID would be a better way to do it.
So, this patch sets random UUIDs for URIs of bookmarks.
Had some problems with PuSH and Salmon use of Bookmarks; they were
being required to generate Atom versions of the bookmark _before_ the bookmark was saved.
So, I reversed the order of how things are saved, and associate notices and bookmarks
by URI rather than notice_id.
Form for saving bookmarks that looks like the delicious.com form.
Save a new notice with the right text, but attach a new notice_bookmark
table which marks this as a bookmark. Tags, URLs are kept the same.
Fixes for Twitter bridge breakage on 32-bit servers. New "Snowflake" 64-bit IDs have become too big to fit in the integer portion of double-precision floats, so to reliably use these IDs we need to pull the new string form now.
Machines with 64-bit PHP installation should have had no problems (except on Windows, where integers are still 32 bits)
Conflicts:
plugins/TwitterBridge/twitterimport.php <- as this hasn't been broken out, the import code is NOT FULLY UPDATED HERE.
Some of our caching systems, like the disk cache or memcached, have
significant overhead (network connections or disk I/O).
This plugin adds an additional layer of in-process cache, so we don't
need to reconnect to external cache systems when we've already
received a data item from the cache. There are some concurrency issues
here, but typically they won't be important at the level of a single
web hit.
Piwik's current default recommended JS for loading creates a <script> tag via document.write(). In addition to being generally evil, this means the browser doesn't know it's going to need piwik.js until that chunk of script gets executed... which can't happen until all scripts referenced *before* it have been loaded and executed.
The only reason for that bit of script though seems to be to pick 'http' or 'https' depending on the current page's scheme. This can be done more simply by using a protocol-relative link (eg "//piwik.status.net/piwik.js"), which the browser will resolve as appropriate. Since it's now sitting in the <script> tag, the browser's lookahead code will now see it and be able to start loading it while earlier things are parsing/executing.
May be better still to move to an asynchronous load after DOM-ready, but I'm not sure if that'll screw with the analytics code (eg, not being able to start things on the DOM-ready events since they're past).
The default full build of OpenLayers.js is 943kb as of 2.10; this gzips down to a couple hundred kb
but is still rather nasty, plus loading it off a remote host could slow things down.
Using a local copy let us cut down the size significantly by discarding unused features, and further
minification with yui-compressor shaves a bit more off. Cuts down to about 1/5 the size of the
original.
Also threw in a bundled & minified copy of the Mapstraction classes plus our usermap.js,
which covers the common case of using the default OpenLayers provider. This cuts out three
additional script loads, two of which weren't getting launched until after the mxn.js main
file got loaded.
Included Makefile will recreate the OpenLayers.js using the statusnet.cfg strip configuration file
and yui-compressor to do some extra minification at the end. Requires fetching the OpenLayers
source download and dropping it in:
http://openlayers.org/download/OpenLayers-2.10.tar.gz
$config['twitter']['ignore_errors'] = true;
A longer-term solution is to patch up the indirect retry handling to count retries better, or delay for later retry sensibly.
common_shorten_links() can only access the web session's logged-in user, so never properly took user options into effect for posting via XMPP, API, mail, etc.
Adds an optional $user parameter on common_shorten_links(), and a $user->shortenLinks() as a clearer interface for that.
Tweaked some lower-level functions so $user gets passed down -- making the $notice_id param previously there for saving URLs at notice save time generalized a little.
Note also ticket #2919: there's a lot of duplicate code calling the shortening, checking the length, and reporting near-identical error messages. These should be consolidated to aid in code and translation maintenance.
Separating the two forms (one to create a local account, the other to attach the OpenID to an existing account) gets them working -- enter activates the appropriate default button.
We were clearing the counter on the window title in the blur event, which gets fired *after* we switch away, thus triggering Firefox to mark the tab as updated again.
Clearing the counter on *focus* instead avoids this, and keeps the counter out of the way as well.
Identified several bugs and fixmes, and added more thorough labeling of the issues with replicating the entire HTML structure of notices (no i18n, missing new features, maintenance problems, possible other issues)
Most annoying error case being where the notice was already faved or deleted on Twitter! :)
Such errors will now just fail out and log a note to the syslog -- the rest of what we were doing will continue on unhindered, so you can still delete, favorite, etc and it just won't sync the info over in that case.
This option may be useful for intranet sites that don't have direct access to the internet, as they may be unable to successfully fetch those resources.
When the retweet failed with a 403 error (say due to it being a private tweet, which can't be retweeted) we would end up mishandling the return value from our internal error handling.
Instead of correctly discarding the message and closing out the queue item, we ended up trying to save a bogus twitter<->local ID mapping, which threw another exception and lead the queue system to re-run it.
- Fixed the logic check and return values for the retweet case in broadcast_twitter().
- Added doc comments explaining the return values on some functions in twitter.php
- Added check on Notice_to_status::saveNew() for empty input -- throw an exception before we try to actually insert into db. :)
Added the necessary classes to send email summaries. First, added a
script to run on a daily basis. Second, added a queue handler for
sending email summaries for users, and another to queue summaries for
all users on the site. Fixed up the email_summary_status table to
store the last-sent notice id, rather than a datetime (since we don't
support 'since' parameters anymore). Finally, made the plugin class
load the right modules when needed.
Pulled common code for the profile page and profile list cases to give them the same logic on checking. Also fixes the problem that you'd get a flag button for yourself in profile lists, while we explicitly exclude that from the profile page -- it's now skipped in both places.
StatusNet core code now sets the tooltip text on .attachment.more links when they receive their attachment-expansion magic; this will override the hardcoded tooltip text saved from OStatus plugin when displaying timelines in the web UI.
Data are sent to the 'info' level of logging, like so:
[lazarus.local:4812.86b23603 GET /mublog/api/statuses/friends_timeline.atom?since_id=1353]
STATLOG action:apitimelinefriends method:GET ssl:no query:since_id cookie:no auth:yes
ifmatch:no ifmod:no agent:Appcelerator Titanium/1.4.1 (iPhone/4.1; iPhone OS; en_US;)
Fields:
* action: case-normalized name of the action class we're acting on
* method: GET, POST, HEAD, etc
* ssl: Are we on HTTPS? 'yes' or 'no'
* query: Were we sent a query string? 'yes', 'no', or 'since_id' if the only parameter is a since_id
* cookie: Were we sent any cookies? 'yes' or 'no'
* auth: Were we sent an HTTP Authorization header? 'yes' or 'no'
* ifmatch: Were we sent an HTTP If-Match header for an ETag? 'yes' or 'no'
* ifmod: Were we sent an HTTP If-Modified-Since header? 'yes' or 'no'
* agent: User-agent string, to aid in figuring out what these things are
The most shared-cache-friendly requests will be non-SSL GET requests with no or very predictable
query parameters, no cookies, and no authorization headers. Private caching (eg within a supporting
user-agent) could still be friendly to SSL and auth'd GET requests.
We kind of expect that the most frequent hits from clients will be GETs for a few common timelines,
with auth headers, a since_id-only query, and no cookies. These should at least be amenable to
returning 304 matches for etags or last-modified headers with private caching, but it's very
possible that most clients won't actually think to save and send them. That would leave us expecting
to handle a lot of timeline since_id hits that return a valid API response with no notices.
At this point we don't expect to actually see if-match or if-modified-since a lot since most of our
API responses are marked as uncacheable; so even if we output them they're not getting sent back to
us.
Random subsampling can be enabled by setting the 'frequency' parameter smaller than 1.0:
addPlugin('ApiLogger', array(
'frequency' => 0.5 // Record 50% of API hits
));