merged branch vicb/url_decoding (PR #3780)

Commits
-------

55014a6 [Routing] Request methods always return a raw path, fix the matcher to decode only once
d17ba0e Fixed base URL detection when request URI contains encoded chars

Discussion
----------

[RFC] Fix issues with url decoding

Related: #2324, #2963, #2962, #2579

### This PR fixes two issues:

* `+` in paths were turned to " " by `urldecode()`
* `urldecode()` was called a few times (and a different number of times according to which part of the path was handled, see #2962 for details).

### BC Breaks:

* `Request::getPathInfo()`, `Request::getBaseUrl()` and `Request::getBasePath` now return the raw (encoded) path (vs a decoded path before this PR). You should check any calls to these methods in your code and wrap them in `rawurldecode()` if needed.
* The `UrlMatcher` now decodes the URL only once (i.e. variable are no more decoded twice) and use `rawurldecode()` to support `+`.

### Notes:

* @arnaud-lb, the first commit is based on your #2963 so I put your name for the commit,

### Comment history from the original PR:

@vicb

**The state before this PR:**

* getRequestUri() returns an **encoded** value
* getPathInfo() returns a **decoded** value
* getBaseUrl() returns a **decoded** value
* getBasePath() returns a **decoded** value

The decoded value is wrong as `urldecode` is used in place of `rawurldecode` turning `+` into a space character (#2324).

The matcher starts by urldecoing the path (it is already decoded as explained right before) and then urldecodes each variable one more time.

We end up with a path being decoded twice and variables being decoded three times.

`Request::getUri()` calls both `getBaseUrl()` and `getPathInfo()` so that the return URI is **decoded**.

**The state after the PR:**

* getRequestUri() returns an **encoded** value
* getPathInfo() returns an **encoded** value
* getBaseUrl() returns an **encoded** value
* getBasePath() returns an **encoded** value

We are consistent and we have the raw values everywhere - there is no (easy) way to get the encoded value back once it has been decoded as it is done in the current code.

The matcher relies on an encoded value and decode the value only once (using `rawurldecode` to support `+`s).

So basically this PR:

* fix a bug - URL with `+` are now supported,
* makes paths consistent - encoded values everywhere, including `getUri()`
* makes variables consistent: they are decoded only once - the same as query string parameters.

There are some BC breaks:

* getPathInfo() returns an encoded value vs a decoded one before,
* getBaseUrl() returns an encoded value vs a decoded one before.
* getBasePath() returns an encoded value vs a decoded one before.

Any code relying on the output of one of the 2 previous methods should be checked and upgraded if needed. I am interested in the use cases where your code need to be updated.

@Seldaek

I checked a few projects and this is what I found (usage of getPathInfo & getBaseUrl):

- One use case of getPathInfo to check if the url started with `/something/` which is a prefix used for all "overlay" content which had to be treated differently somewhere => most likely unaffected
- One use case for checking path prefixes by regex in our [CorsBundle](https://github.com/nelmio/NelmioCorsBundle/blob/master/EventListener/CorsListener.php#L52-56) => potentially affected depending on the complexity of regexes I'd say

@vicb

Thanks @Seldaek for reporting the use cases. You second case would be solved by `rawurldecode`ing the path info which is a minimal change.

And in general I think we have to expand to doc to specify the url format that should be used.

---------------------------------------------------------------------------

by vicb at 2012-04-04T13:42:21Z

I'll squash the commits before this gets merged but for now it make the review easier.
This commit is contained in:
Fabien Potencier 2012-04-10 10:43:27 +02:00
commit 8860424823
14 changed files with 213 additions and 35 deletions

View File

@ -329,6 +329,10 @@ To get the diff between two versions, go to https://github.com/symfony/symfony/c
* Request::getClientIp() method doesn't take a parameter anymore but bases
itself on the trustProxy parameter.
* Added isMethod() to Request object.
* [BC BREAK] The methods `getPathInfo()`, `getBaseUrl()` and `getBasePath()` of
a `Request` now all return a raw value (vs a urldecoded value before). Any call
to one of these methods must be checked and wrapped in a `rawurldecode()` if
needed.
### HttpKernel
@ -356,6 +360,9 @@ To get the diff between two versions, go to https://github.com/symfony/symfony/c
* added a TraceableUrlMatcher
* added the possibility to define options, default values and requirements for placeholders in prefix, including imported routes
* added RouterInterface::getRouteCollection
* [BC BREAK] the UrlMatcher urldecodes the route parameters only once, they were
decoded twice before. Note that the `urldecode()` calls have been change for a
single `rawurldecode()` in order to support `+` for input paths.
### Security

View File

@ -130,7 +130,10 @@ class Configuration implements ConfigurationInterface
->performNoDeepMerging()
->children()
->scalarNode('ip')->end()
->scalarNode('path')->end()
->scalarNode('path')
->setInfo('use the urldecoded format')
->setExample('^/path to resource/')
->end()
->scalarNode('service')->end()
->end()
->end()

View File

@ -154,7 +154,11 @@ class MainConfiguration implements ConfigurationInterface
->prototype('array')
->children()
->scalarNode('requires_channel')->defaultNull()->end()
->scalarNode('path')->defaultNull()->end()
->scalarNode('path')
->defaultNull()
->setInfo('use the urldecoded format')
->setExample('^/path to resource/')
->end()
->scalarNode('host')->defaultNull()->end()
->scalarNode('ip')->defaultNull()->end()
->arrayNode('methods')

View File

@ -16,6 +16,14 @@ use Symfony\Component\HttpFoundation\Session\SessionInterface;
/**
* Request represents an HTTP request.
*
* The methods dealing with URL accept / return a raw path (% encoded):
* * getBasePath
* * getBaseUrl
* * getPathInfo
* * getRequestUri
* * getUri
* * getUriForPath
*
* @author Fabien Potencier <fabien@symfony.com>
*
* @api
@ -568,9 +576,10 @@ class Request
*
* * http://localhost/mysite returns an empty string
* * http://localhost/mysite/about returns '/about'
* * htpp://localhost/mysite/enco%20ded returns '/enco%20ded'
* * http://localhost/mysite/about?var=1 returns '/about'
*
* @return string
* @return string The raw path (i.e. not urldecoded)
*
* @api
*/
@ -588,11 +597,12 @@ class Request
*
* Suppose that an index.php file instantiates this request object:
*
* * http://localhost/index.php returns an empty string
* * http://localhost/index.php/page returns an empty string
* * http://localhost/web/index.php return '/web'
* * http://localhost/index.php returns an empty string
* * http://localhost/index.php/page returns an empty string
* * http://localhost/web/index.php returns '/web'
* * http://localhost/we%20b/index.php returns '/we%20b'
*
* @return string
* @return string The raw path (i.e. not urldecoded)
*
* @api
*/
@ -613,7 +623,7 @@ class Request
* This is similar to getBasePath(), except that it also includes the
* script filename (e.g. index.php) if one exists.
*
* @return string
* @return string The raw url (i.e. not urldecoded)
*
* @api
*/
@ -698,7 +708,7 @@ class Request
/**
* Returns the requested URI.
*
* @return string
* @return string The raw URI (i.e. not urldecoded)
*
* @api
*/
@ -1301,14 +1311,14 @@ class Request
// Does the baseUrl have anything in common with the request_uri?
$requestUri = $this->getRequestUri();
if ($baseUrl && 0 === strpos($requestUri, $baseUrl)) {
if ($baseUrl && false !== $prefix = $this->getUrlencodedPrefix($requestUri, $baseUrl)) {
// full $baseUrl matches
return $baseUrl;
return $prefix;
}
if ($baseUrl && 0 === strpos($requestUri, dirname($baseUrl))) {
if ($baseUrl && false !== $prefix = $this->getUrlencodedPrefix($requestUri, dirname($baseUrl))) {
// directory portion of $baseUrl matches
return rtrim(dirname($baseUrl), '/');
return rtrim($prefix, '/');
}
$truncatedRequestUri = $requestUri;
@ -1317,7 +1327,7 @@ class Request
}
$basename = basename($baseUrl);
if (empty($basename) || !strpos($truncatedRequestUri, $basename)) {
if (empty($basename) || !strpos(rawurldecode($truncatedRequestUri), $basename)) {
// no match whatsoever; set it blank
return '';
}
@ -1378,7 +1388,7 @@ class Request
$requestUri = substr($requestUri, 0, $pos);
}
if ((null !== $baseUrl) && (false === ($pathInfo = substr(urldecode($requestUri), strlen(urldecode($baseUrl)))))) {
if ((null !== $baseUrl) && (false === ($pathInfo = substr($requestUri, strlen($baseUrl))))) {
// If substr() returns false then PATH_INFO is set to an empty string
return '/';
} elseif (null === $baseUrl) {
@ -1423,4 +1433,28 @@ class Request
} catch (\Exception $e) {
}
}
/*
* Returns the prefix as encoded in the string when the string starts with
* the given prefix, false otherwise.
*
* @param string $string The urlencoded string
* @param string $prefix The prefix not encoded
*
* @return string|false The prefix as it is encoded in $string, or false
*/
private function getUrlencodedPrefix($string, $prefix)
{
if (0 !== strpos(rawurldecode($string), $prefix)) {
return false;
}
$len = strlen($prefix);
if (preg_match("#^(%[[:xdigit:]]{2}|.){{$len}}#", $string, $match)) {
return $match[0];
}
return false;
}
}

View File

@ -127,7 +127,7 @@ class RequestMatcher implements RequestMatcherInterface
if (null !== $this->path) {
$path = str_replace('#', '\\#', $this->path);
if (!preg_match('#'.$path.'#', $request->getPathInfo())) {
if (!preg_match('#'.$path.'#', rawurldecode($request->getPathInfo()))) {
return false;
}
}

View File

@ -150,6 +150,14 @@ class RequestMatcherTest extends \PHPUnit_Framework_TestCase
$this->assertFalse($matcher->matches($request));
}
public function testPathWithEncodedCharacters()
{
$matcher = new RequestMatcher();
$request = Request::create('/admin/fo%20o');
$matcher->matchPath('^/admin/fo o*$');
$this->assertTrue($matcher->matches($request));
}
public function testAttributes()
{
$matcher = new RequestMatcher();

View File

@ -315,6 +315,27 @@ class RequestTest extends \PHPUnit_Framework_TestCase
$request->initialize(array(), array(), array(), array(), array(), $server);
$this->assertEquals('http://servername/path/info?query=string', $request->getUri(), '->getUri() with rewrite, default port without HOST_HEADER');
// With encoded characters
$server = array(
'HTTP_HOST' => 'hostname:8080',
'SERVER_NAME' => 'servername',
'SERVER_PORT' => '8080',
'QUERY_STRING' => 'query=string',
'REQUEST_URI' => '/ba%20se/index_dev.php/foo%20bar/in+fo?query=string',
'SCRIPT_NAME' => '/ba se/index_dev.php',
'PATH_TRANSLATED' => 'redirect:/index.php/foo bar/in+fo',
'PHP_SELF' => '/ba se/index_dev.php/path/info',
'SCRIPT_FILENAME' => '/some/where/ba se/index_dev.php',
);
$request->initialize(array(), array(), array(), array(), array(), $server);
$this->assertEquals(
'http://hostname:8080/ba%20se/index_dev.php/foo%20bar/in+fo?query=string',
$request->getUri()
);
}
/**
@ -751,18 +772,11 @@ class RequestTest extends \PHPUnit_Framework_TestCase
$this->assertEquals('/path/info', $request->getPathInfo());
$server = array();
$server['REQUEST_URI'] = '/path test/info';
$request->initialize(array(), array(), array(), array(), array(), $server);
$this->assertEquals('/path test/info', $request->getPathInfo());
$server = array();
$server['REQUEST_URI'] = '/path%20test/info';
$request->initialize(array(), array(), array(), array(), array(), $server);
$this->assertEquals('/path test/info', $request->getPathInfo());
$this->assertEquals('/path%20test/info', $request->getPathInfo());
}
public function testGetPreferredLanguage()
@ -946,6 +960,100 @@ class RequestTest extends \PHPUnit_Framework_TestCase
Request::trustProxyData();
}
/**
* @dataProvider getBaseUrlData
*/
public function testGetBaseUrl($uri, $server, $expectedBaseUrl, $expectedPathInfo)
{
$request = Request::create($uri, 'GET', array(), array(), array(), $server);
$this->assertSame($expectedBaseUrl, $request->getBaseUrl(), 'baseUrl');
$this->assertSame($expectedPathInfo, $request->getPathInfo(), 'pathInfo');
}
public function getBaseUrlData()
{
return array(
array(
'/foo%20bar',
array(
'SCRIPT_FILENAME' => '/home/John Doe/public_html/foo bar/app.php',
'SCRIPT_NAME' => '/foo bar/app.php',
'PHP_SELF' => '/foo bar/app.php',
),
'/foo%20bar',
'/',
),
array(
'/foo%20bar/home',
array(
'SCRIPT_FILENAME' => '/home/John Doe/public_html/foo bar/app.php',
'SCRIPT_NAME' => '/foo bar/app.php',
'PHP_SELF' => '/foo bar/app.php',
),
'/foo%20bar',
'/home',
),
array(
'/foo%20bar/app.php/home',
array(
'SCRIPT_FILENAME' => '/home/John Doe/public_html/foo bar/app.php',
'SCRIPT_NAME' => '/foo bar/app.php',
'PHP_SELF' => '/foo bar/app.php',
),
'/foo%20bar/app.php',
'/home',
),
array(
'/foo%20bar/app.php/home%3Dbaz',
array(
'SCRIPT_FILENAME' => '/home/John Doe/public_html/foo bar/app.php',
'SCRIPT_NAME' => '/foo bar/app.php',
'PHP_SELF' => '/foo bar/app.php',
),
'/foo%20bar/app.php',
'/home%3Dbaz',
),
array(
'/foo/bar+baz',
array(
'SCRIPT_FILENAME' => '/home/John Doe/public_html/foo/app.php',
'SCRIPT_NAME' => '/foo/app.php',
'PHP_SELF' => '/foo/app.php',
),
'/foo',
'/bar+baz',
),
);
}
/**
* @dataProvider urlencodedStringPrefixData
*/
public function testUrlencodedStringPrefix($string, $prefix, $expect)
{
$request = new Request;
$me = new \ReflectionMethod($request, 'getUrlencodedPrefix');
$me->setAccessible(true);
$this->assertSame($expect, $me->invoke($request, $string, $prefix));
}
public function urlencodedStringPrefixData()
{
return array(
array('foo', 'foo', 'foo'),
array('fo%6f', 'foo', 'fo%6f'),
array('foo/bar', 'foo', 'foo'),
array('fo%6f/bar', 'foo', 'fo%6f'),
array('f%6f%6f/bar', 'foo', 'f%6f%6f'),
array('%66%6F%6F/bar', 'foo', '%66%6F%6F'),
array('fo+o/bar', 'fo+o', 'fo+o'),
array('fo%2Bo/bar', 'fo+o', 'fo%2Bo'),
);
}
private function stopTrustingProxyData()
{
$class = new \ReflectionClass('Symfony\\Component\\HttpFoundation\\Request');

View File

@ -62,7 +62,7 @@ class PhpMatcherDumper extends MatcherDumper
public function match(\$pathinfo)
{
\$allow = array();
\$pathinfo = urldecode(\$pathinfo);
\$pathinfo = rawurldecode(\$pathinfo);
$code

View File

@ -43,8 +43,6 @@ class TraceableUrlMatcher extends UrlMatcher
protected function matchCollection($pathinfo, RouteCollection $routes)
{
$pathinfo = urldecode($pathinfo);
foreach ($routes as $name => $route) {
if ($route instanceof RouteCollection) {
if (!$ret = $this->matchCollection($pathinfo, $route)) {

View File

@ -72,7 +72,14 @@ class UrlMatcher implements UrlMatcherInterface
}
/**
* {@inheritDoc}
* Tries to match a URL with a set of routes.
*
* @param string $pathinfo The path info to be parsed (raw format, i.e. not urldecoded)
*
* @return array An array of parameters
*
* @throws ResourceNotFoundException If the resource could not be found
* @throws MethodNotAllowedException If the resource was found but the request method is not allowed
*
* @api
*/
@ -80,7 +87,7 @@ class UrlMatcher implements UrlMatcherInterface
{
$this->allow = array();
if ($ret = $this->matchCollection(urldecode($pathinfo), $this->routes)) {
if ($ret = $this->matchCollection(rawurldecode($pathinfo), $this->routes)) {
return $ret;
}
@ -177,7 +184,7 @@ class UrlMatcher implements UrlMatcherInterface
$parameters = $defaults;
foreach ($params as $key => $value) {
if (!is_int($key)) {
$parameters[$key] = rawurldecode($value);
$parameters[$key] = $value;
}
}

View File

@ -30,7 +30,7 @@ interface UrlMatcherInterface extends RequestContextAwareInterface
* If the matcher can not find information, it must throw one of the exceptions documented
* below.
*
* @param string $pathinfo The path info to be parsed
* @param string $pathinfo The path info to be parsed (raw format, i.e. not urldecoded)
*
* @return array An array of parameters
*

View File

@ -23,7 +23,7 @@ class ProjectUrlMatcher extends Symfony\Component\Routing\Matcher\UrlMatcher
public function match($pathinfo)
{
$allow = array();
$pathinfo = urldecode($pathinfo);
$pathinfo = rawurldecode($pathinfo);
// foo
if (0 === strpos($pathinfo, '/foo') && preg_match('#^/foo/(?P<bar>baz|symfony)$#xs', $pathinfo, $matches)) {

View File

@ -23,7 +23,7 @@ class ProjectUrlMatcher extends Symfony\Component\Routing\Tests\Fixtures\Redirec
public function match($pathinfo)
{
$allow = array();
$pathinfo = urldecode($pathinfo);
$pathinfo = rawurldecode($pathinfo);
// foo
if (0 === strpos($pathinfo, '/foo') && preg_match('#^/foo/(?P<bar>baz|symfony)$#xs', $pathinfo, $matches)) {

View File

@ -157,8 +157,8 @@ class UrlMatcherTest extends \PHPUnit_Framework_TestCase
$collection->add('foo', new Route('/{foo}/bar', array(), array('foo' => '['.preg_quote($chars).']+')));
$matcher = new UrlMatcher($collection, new RequestContext(), array());
$this->assertEquals(array('_route' => 'foo', 'foo' => $chars), $matcher->match('/'.urlencode($chars).'/bar'));
$this->assertEquals(array('_route' => 'foo', 'foo' => $chars), $matcher->match('/'.strtr($chars, array('%' => '%25', '+' => '%2B')).'/bar'));
$this->assertEquals(array('_route' => 'foo', 'foo' => $chars), $matcher->match('/'.rawurlencode($chars).'/bar'));
$this->assertEquals(array('_route' => 'foo', 'foo' => $chars), $matcher->match('/'.strtr($chars, array('%' => '%25')).'/bar'));
}
public function testMatchWithDotMetacharacterInRequirements()
@ -216,4 +216,13 @@ class UrlMatcherTest extends \PHPUnit_Framework_TestCase
$matcher = new UrlMatcher($coll, new RequestContext());
$matcher->match('/foo');
}
public function testDecodeOnce()
{
$coll = new RouteCollection();
$coll->add('foo', new Route('/foo/{foo}'));
$matcher = new UrlMatcher($coll, new RequestContext());
$this->assertEquals(array('foo' => 'bar%23', '_route' => 'foo'), $matcher->match('/foo/bar%2523'));
}
}