feature #24372 [DowCrawler] Default to UTF-8 when possible (nicolas-grekas)
This PR was merged into the 3.4 branch.
Discussion
----------
[DowCrawler] Default to UTF-8 when possible
| Q | A
| ------------- | ---
| Branch? | 3.4
| Bug fix? | no
| New feature? | yes
| BC breaks? | no
| Deprecations? | no
| Tests pass? | yes
| Fixed tickets | #17258
| License | MIT
| Doc PR | -
This can't be ambiguous, let's use UTF-8 when possible.
Commits
-------
73eda66b99
[DowCrawler] Default to UTF-8 when possible
This commit is contained in:
commit
c10baf9e9f
@ -127,8 +127,8 @@ class Crawler implements \Countable, \IteratorAggregate
|
|||||||
/**
|
/**
|
||||||
* Adds HTML/XML content.
|
* Adds HTML/XML content.
|
||||||
*
|
*
|
||||||
* If the charset is not set via the content type, it is assumed
|
* If the charset is not set via the content type, it is assumed to be UTF-8,
|
||||||
* to be ISO-8859-1, which is the default charset defined by the
|
* or ISO-8859-1 as a fallback, which is the default charset defined by the
|
||||||
* HTTP 1.1 specification.
|
* HTTP 1.1 specification.
|
||||||
*
|
*
|
||||||
* @param string $content A string to parse as HTML/XML
|
* @param string $content A string to parse as HTML/XML
|
||||||
@ -161,7 +161,7 @@ class Crawler implements \Countable, \IteratorAggregate
|
|||||||
}
|
}
|
||||||
|
|
||||||
if (null === $charset) {
|
if (null === $charset) {
|
||||||
$charset = 'ISO-8859-1';
|
$charset = preg_match('//u', $content) ? 'UTF-8' : 'ISO-8859-1';
|
||||||
}
|
}
|
||||||
|
|
||||||
if ('x' === $xmlMatches[1]) {
|
if ('x' === $xmlMatches[1]) {
|
||||||
|
Reference in New Issue
Block a user