gnusocial.rocks/soc/2019/tech_report/load/index.html

110 lines
7.8 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>Optimizations on Load Balance System and Storage Usage - Tech Report - GNU social Summer of Code 2019</title>
<link rel="icon" href="../../favicon.png">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://www.diogo.site/projects/excalibur_template/assets/css/main.css">
<style>
@page {
size: A4 portraint;
}
@page :blank {
@top-center { content: "This page is intentionally left blank." }
}
h1 {
page-break-before: always;
}
h1, h2, h3, h4, h5 {
page-break-after: avoid;
}
table, figure {
page-break-inside: avoid;
}
@page:right{
@bottom-right {
content: "Page " counter(page) " of " counter(pages);
}
}
</style>
</head>
<body>
<header id="header">
<nav id="side-menu">
<label for="show-menu" id="menu-button">Menu</label>
<input id="show-menu" role="button" type="checkbox">
<ul id="menu">
<li><a href="../../"><strong>&larr; GS GSoC 2019</strong></a></li>
<li><a href="#about">About</a></li>
<li><a href="#image-system">Image System</a></li>
<li><a href="#embed-plugin">Embed Plugin</a></li>
<li><a href="#queue-system">Queue System</a></li>
<li><a href="#caching-system">Caching System</a></li>
<li><a href="#acknowledgements">Acknowledgements</a></li>
</ul>
</nav>
<h1 class="title">Optimizations on GNU social's Load Balance and Storage systems</h1>
<p>Developed by <strong><a href="https://http://loadaverage.org/biodan">Miguel Dantas</a></strong> during <strong><a href="https://summerofcode.withgoogle.com/archive/2019/#5289618063228928">Google Summer of Code 2019</a></strong></p>
<p>Mentored by <a href="https://www.diogo.site/">Diogo Cordeiro</a></p>
</header>
<h2 id="about">About</h2>
<h3 id="what-is-this-document">What is this document?</h3>
<p>This is a technical Google Summer of Code Project Report describing briefly what I did from May 6 to August 26.
</p>
<p>All the code I wrote during this period is available
<a href="https://notabug.org/diogo/gnu-social/src/7291e1b2a4d01a03400a1685879d4faa62c1645d">here</a>.
</p>
<p>This document helps understand what was done in a simple and informal way, highlighting the most relevant parts, why some decisions were made and what Ive learned with it.</p>
<p>From May 6 to May 27 I had to familiarize myself with the GNU socials plugins and events API, as well as other internal components and the community.</p>
<h3 id="what-is-gnu-social">What is GNU social?</h3>
<p>GNU social is a social communication software used in federated social networks. That requires queues and caching systems that are friendly to use and yet powerful and robust enough to handle the high load in servers with limited network, low storage
and reduced processing power. While scalable enough to take advantage of more powerful systems.</p>
<h3 id="abstract">Abstract</h3>
<p>TODO</p>
<h4 id="benefits">Benefits</h4>
<ul>
<li>Improvements on the Image Systems</li>
<li>Improvements of the Embed plugin</li>
<li>Improvements on the Queue System</li>
<li>Improvements on the Caching System</li>
<!--<li>Cleanup and organize the <code>lib</code> folder into semantic categories</li>-->
</ul>
<h2 id="image-system">Image System</h2>
<p>The new Image handling system was my first bigger project and in it I essentially reviewed and refactored the existing code, followed by making sure a consistent and filesystem safe encoding is use for the filename, which is then further provided
when the file download is requested. This very file download was moved into PHP code so that the code could have easy control over what files are accessed; the main goal with this change was to allow arbitrary file uploads, while still ensuring
that no file is directly publicly accessible, as a misconfigured webserver or a maliciously crafted file could cause unwanted execution. In addition, this will allow, in the future, for easy permission control, via events, if a plugin to do so
is written. Along the same lines, it was important to make image validation more aggressive. While not immediately apparent, an image and a script may be contained in the same file and it was possible for a file to be identified as an image by
the upload code and then be executed by the webserver.</p>
<h2 id="embed-plugin">Embed Plugin</h2>
<p>The existing Oembed plugin did a valiant attempt trying to get both Oembed and OpenGraph info about a page. The problem, besides the poor name given the dual purpose, was that it used an in-house implementation, which was not complete. This was therefore
replaced by an external library called <code>Embed</code>, by <code>oscarotero</code>. After refactoring and checking the existing code, as well as making sure the same filename conventions as above were in effect, image handling became significantly
more robust, eliminating the barrage of errors users would previously see, instead of images. In addition, I made it so only a configurable thumbnail size for the images was saved to disk, while still supporting upscaling should further themes
need it, which should significantly reduce disk space usage.</p>
<h2 id="queue-system">Queue System</h2>
<p>The queue system received a refactor and checking, modernizing and cleaning up the code in places. I also added the option to use Redis as a backing for this, through a new <code>RedisQueue</code> plugin.</p>
<p>In addition, the existing DB, STOMP and UNQUEUE queue managers were broken out into plugins, allowing for easier management as well as making it more unified.</p>
<h2 id="caching-system">Caching System</h2>
<p>Similarly, the caching system received the same treatment and a new <code>RedisCache</code> plugin was created.</p>
<!--
<h2 id="lib-refactor"><code>lib</code> Refactor</h2>
<p>The entire <code>lib</code> folder was separated into categories of semantic meaning. At first I planned to use the <code>@category</code> tags in some files, but this proved both sparse and unreliable, as some seem to have been copied. This change also required updating the internal autoload system, which while clever in concept, consisted, in practice, of a bunch of <code>if</code> statements. My change made it much more easily expandable and cleaner.</p>
-->
<h2 id="acknowledgements">Final Words</h2>
<p>GSoC was a wonderful experience for me. I now feel more comfortable with the GNU socials codebase as well as more confident in tackling other such large codebases. I learned a lot of useful stuff general software design principles, maintainability,
web security and Redis. Ive also learned more about <code>git</code> and how libre and open source software development is made and organized.</p>
<p>I look forward to regularly contributing to GNU social and other projects.</p>
<p>Thanks to Diogo Cordeiro for such a wonderful experience and the help and knowledge he lent.</p>
</body>
</html>