feature #24372 [DowCrawler] Default to UTF-8 when possible (nicolas-grekas)

This PR was merged into the 3.4 branch.

Discussion
----------

[DowCrawler] Default to UTF-8 when possible

| Q             | A
| ------------- | ---
| Branch?       | 3.4
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #17258
| License       | MIT
| Doc PR        | -

This can't be ambiguous, let's use UTF-8 when possible.

Commits
-------

73eda66b99 [DowCrawler] Default to UTF-8 when possible
This commit is contained in:
Fabien Potencier 2017-09-29 06:59:44 -07:00
commit c10baf9e9f

View File

@ -127,8 +127,8 @@ class Crawler implements \Countable, \IteratorAggregate
/**
* Adds HTML/XML content.
*
* If the charset is not set via the content type, it is assumed
* to be ISO-8859-1, which is the default charset defined by the
* If the charset is not set via the content type, it is assumed to be UTF-8,
* or ISO-8859-1 as a fallback, which is the default charset defined by the
* HTTP 1.1 specification.
*
* @param string $content A string to parse as HTML/XML
@ -161,7 +161,7 @@ class Crawler implements \Countable, \IteratorAggregate
}
if (null === $charset) {
$charset = 'ISO-8859-1';
$charset = preg_match('//u', $content) ? 'UTF-8' : 'ISO-8859-1';
}
if ('x' === $xmlMatches[1]) {