get headers - PHP - `get_headers` returns "400 Bad Request" and "403 Forbidden" for valid URLs? -
working solution @ bottom of description!
i running php 5.4, , trying headers of list of urls.
for part, working fine, there 3 urls causing issues (and more, more extensive testing).
'http://www.alealimay.com' 'http://www.thelovelist.net' 'http://www.bleedingcool.com'
all 3 sites work fine in browser, , produce following header responses:
(from safari)
note 3 header responses code = 200
but retrieving headers via php, using get_headers
...
stream_context_set_default(array('http' => array('method' => "head"))); $headers = get_headers($url, 1); stream_context_set_default(array('http' => array('method' => "get")));
... returns following:
url ...... "http://www.alealimay.com" headers | 0 ............................ "http/1.0 400 bad request" | content-length ............... "378" | x-synthetic .................. "true" | expires ...................... "thu, 01 jan 1970 00:00:00 utc" | pragma ....................... "no-cache" | cache-control ................ "no-cache, must-revalidate" | content-type ................. "text/html; charset=utf-8" | connection ................... "close" | date ......................... "wed, 24 aug 2016 01:26:21 utc" | x-contextid .................. "qifb0i8v/xstfmreg" | x-via ........................ "1.0 echo109" url ...... "http://www.thelovelist.net" headers | 0 ............................ "http/1.0 400 bad request" | content-length ............... "378" | x-synthetic .................. "true" | expires ...................... "thu, 01 jan 1970 00:00:00 utc" | pragma ....................... "no-cache" | cache-control ................ "no-cache, must-revalidate" | content-type ................. "text/html; charset=utf-8" | connection ................... "close" | date ......................... "wed, 24 aug 2016 01:26:22 utc" | x-contextid .................. "ankvf2rb/bimjwyjw" | x-via ........................ "1.0 echo103" url ...... "http://www.bleedingcool.com" headers | 0 ............................ "http/1.1 403 forbidden" | server ....................... "sucuri/cloudproxy" | date ......................... "wed, 24 aug 2016 01:26:22 gmt" | content-type ................. "text/html" | content-length ............... "5311" | connection ................... "close" | vary ......................... "accept-encoding" | etag ......................... "\"57b7f28e-14bf\"" | x-xss-protection ............. "1; mode=block" | x-frame-options .............. "sameorigin" | x-content-type-options ....... "nosniff" | x-sucuri-id .................. "11005"
this case regardless of changing stream_context
//stream_context_set_default(array('http' => array('method' => "head"))); $headers = get_headers($url, 1); //stream_context_set_default(array('http' => array('method' => "get")));
produces same result.
no warnings or errors thrown of these (normally have errors suppressed @get_headers
, there no difference either way).
i have checked php.ini
, , have allow_url_fopen
set on
.
i headed towards stream_get_meta_data
, and not interested in curl
solutions. stream_get_meta_data
(and accompanying fopen
) fail in same spot get_headers
, fixing 1 fix both in case.
usually, if there redirects, output looks like:
url ...... "http://www.startingurl.com/" headers | 0 ............................ "http/1.1 301 moved permanently" | 1 ............................ "http/1.1 200 ok" | date | | "wed, 24 aug 2016 02:02:29 gmt" | | "wed, 24 aug 2016 02:02:32 gmt" | | server | | "apache" | | "apache" | | location ..................... "http://finishingurl.com/" | connection | | "close" | | "close" | | content-type | | "text/html; charset=utf-8" | | "text/html; charset=utf-8" | | link ......................... "; rel=\"https://api.w.org/\", ; rel=shortlink"
how come sites work in browsers, fail when using get_headers
?
there various posts discussing same thing, solution of them doesn't pertain case:
post
requires content-length
(i'm sending head
request, no content returned)
url contains utf-8 data (the chars in these urls latin alphabet)
cannot send url spaces in it (these urls space-free, , ordinary in every way)
solution!
(thanks max in answers below pointing me on right track.)
the issue because there no pre-defined user_agent
, without either setting on in php.ini
, or declaring in code.
so, change user_agent
mimic browser, deed, , revert stating value (likely blank).
$originaluseragent = ini_get('user_agent'); ini_set('user_agent', 'mozilla/5.0'); $headers = @get_headers($url, 1); ini_set('user_agent', $originaluseragent);
user agent change found here.
it happens because 3 these sites checking useragent header of request , response error in case if not matched. get_headers
function not send header. may try curl , code snippet getting content of sites:
$url = 'http://www.alealimay.com'; $c = curl_init($url); curl_setopt($c, curlopt_useragent, 'curl/7.48.0'); curl_exec($c); var_dump(curl_getinfo($c));
upd: it's not necessary use curl setting user agent header. can done ini_set('user_agent', 'mozilla/5.0');
, get_headers
function use configured value.
Comments
Post a Comment