get headers - PHP - Differences between `get_headers` and `stream_get_meta_data`? -
intro / disclaimer
decent chunks of outputs can largely ignored. still bit of reader, i'm trying thorough in analysis , questioning. if familiar stream_get_meta_data
, fine skip "questions" @ end.
other in docs, having trouble finding out php's stream_get_meta_data
. overall functionality not vastly different of php's get_headers
, cannot life of me find comparisons between two, or pros/cons of former.
the setup
up until point, i've used php's get_headers
verify validity of url. downside get_headers
is notoriously slow. understandably, of latency directly due server hosting site of interest, maybe method overly robust, or else slowing down.
there plenty of links recommend using curl
, claiming faster, i've run side-by-side, timed tests of both, , get_headers
has come out on top, factor of 1.5 or 2.
i've yet see solutions using stream_get_meta_data
, , stumbled upon first time today. i've exhausted google skills, without luck. but, in interest of optimizing scheme, ran tests.
the testing
comparisons between get_headers
, stream_get_meta_data
run using list of 106 current (i.e. live, valid, status=200) urls:
code block #1
// urls in format "http://www.domain.com" $urls = array('...', '...', '...'); // *106 urls // get_headers $start = microtime(true); foreach($urls $url) { try{ // unfortunately, get_headers not offer context argument stream_context_set_default(array('http' => array('method' => "head"))); $headers[] = @get_headers($url, 1); stream_context_set_default(array('http' => array('method' => "get"))); }catch(exception $e){ continue; } } $end1 = microtime(true) - $start; // stream_get_meta_data $cont = stream_context_create(array('http' => array('method' => "head"))); $start = microtime(true); foreach($urls $url) { try{ $fp = fopen($url, 'rb', false, $cont); if(!$fp) { continue; } $streams[] = stream_get_meta_data($fp); }catch(exception $e){ continue; } } $end2 = microtime(true) - $start;
and results i'm getting stream_get_meta_data
coming out on top, 90% of time, or more. times identical, more not stream_get_meta_data
has shorter run-time
run times #1
"get_headers": 112.23 // seconds "stream_get": 42.61 // seconds
with [stringified] outputs of 2 being like:
excerpt of comparison #1
url .. "http://www.wired.com/" get_headers | 0 ............................ "http/1.1 200 ok" | access-control-allow-origin .. "*" | cache-control ................ "stale-while-revalidate=86400, stale-while-error=86400" | content-type ................. "text/html; charset=utf-8" | link ......................... "; rel=\"https://api.w.org/\"" | server ....................... "apache" | via | | "1.1 varnish" | | "1.1 varnish" | | fastly-debug-state ........... "hit" | fastly-debug-digest .......... "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" | content-length ............... "135495" | accept-ranges ................ "bytes" | date ......................... "tue, 23 aug 2016 22:32:26 gmt" | age .......................... "701" | connection ................... "close" | x-served-by .................. "cache-jfk8149-jfk, cache-den6024-den" | x-cache ...................... "hit, hit" | x-cache-hits ................. "51, 1" | x-timer ...................... "s1471991546.459931,vs0,ve0" | vary ......................... "accept-encoding" stream_get | wrapper_data | | "http/1.1 200 ok" | | "access-control-allow-origin: *" | | "cache-control: stale-while-revalidate=86400, stale-while-error=86400" | | "content-type: text/html; charset=utf-8" | | "link: ; rel=\"https://api.w.org/\"" | | "server: apache" | | "via: 1.1 varnish" | | "fastly-debug-state: hit" | | "fastly-debug-digest: c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" | | "content-length: 135495" | | "accept-ranges: bytes" | | "date: tue, 23 aug 2016 22:32:26 gmt" | | "via: 1.1 varnish" | | "age: 701" | | "connection: close" | | "x-served-by: cache-jfk8149-jfk, cache-den6020-den" | | "x-cache: hit, hit" | | "x-cache-hits: 51, 1" | | "x-timer: s1471991546.614958,vs0,ve0" | | "vary: accept-encoding" | | wrapper_type ................. "http" | stream_type .................. "tcp_socket/ssl" | mode ......................... "rb" | unread_bytes ................. 0 | seekable ..................... false | uri .......................... "http://www.wired.com/" | timed_out .................... false | blocked ...................... true | eof .......................... false
for part, same data, exception stream_get_meta_data
doesn't offer way include keys wrapper_data
, without parsing through manually.
easy enough...
code block #2.1/2.2
$wd = $meta[$url]['wrapper_data']; $warr = wrappertokeys($wd);
where...
function wrappertokeys($wd) { $warr = array(); foreach($wd $row) { $pos = strpos($row, ': '); // *assuming* separated ": " (might colon, without space?) if($pos === false) { $warr[] = $row; }else { // $pos, $key , $value can done 1 preg_match $key = substr($row, 0, $pos); $value = substr($row, ($pos + 2)); // if key doesn't exist, assign value if(empty($warr[$key])) { $warr[$key] = $value; } // if key points array, add value array else if(is_array($warr[$key])) { $warr[$key][] = $value; } // if key points string, swap value array else { $warr[$key] = array($warr[$key], $value); } } } return $warr; }
and output identical get_headers($url, 1)
:
excerpt of comparison #2
url .. "http://www.wired.com/" headers | 0 ............................ "http/1.1 200 ok" | access-control-allow-origin .. "*" | cache-control ................ "stale-while-revalidate=86400, stale-while-error=86400" | content-type ................. "text/html; charset=utf-8" | link ......................... "; rel=\"https://api.w.org/\"" | server ....................... "apache" | via | | "1.1 varnish" | | "1.1 varnish" | | fastly-debug-state ........... "hit" | fastly-debug-digest .......... "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" | content-length ............... "135495" | accept-ranges ................ "bytes" | date ......................... "tue, 23 aug 2016 22:35:29 gmt" | age .......................... "883" | connection ................... "close" | x-served-by .................. "cache-jfk8149-jfk, cache-den6027-den" | x-cache ...................... "hit, hit" | x-cache-hits ................. "51, 1" | x-timer ...................... "s1471991729.021214,vs0,ve0" | vary ......................... "accept-encoding" w-arr | 0 ............................ "http/1.1 200 ok" | access-control-allow-origin .. "*" | cache-control ................ "stale-while-revalidate=86400, stale-while-error=86400" | content-type ................. "text/html; charset=utf-8" | link ......................... "; rel=\"https://api.w.org/\"" | server ....................... "apache" | via | | "1.1 varnish" | | "1.1 varnish" | | fastly-debug-state ........... "hit" | fastly-debug-digest .......... "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" | content-length ............... "135495" | accept-ranges ................ "bytes" | date ......................... "tue, 23 aug 2016 22:35:29 gmt" | age .......................... "884" | connection ................... "close" | x-served-by .................. "cache-jfk8149-jfk, cache-den6021-den" | x-cache ...................... "hit, hit" | x-cache-hits ................. "51, 1" | x-timer ...................... "s1471991729.173641,vs0,ve0" | vary ......................... "accept-encoding"
even sorting out keys, stream_get_meta_data
champion:
sample run times #2
"get_headers": 99.51 // seconds "stream_get": 43.79 // seconds
note: these tests being run on cheap shared server - hence large variations in testing times. being said, gap between 2 methods highly consistent between tests.
additional
for of understand c-code php, , feel might able gain insight it, function definitions can found at:
and
'stream_get_meta_data' (php git)
questions
how come
stream_get_meta_data
underrepresented (in searches , available code snippets) comparedget_headers
?the way i've worded leads opinions, intent more along lines of: "is there well-known , terrible
stream_get_meta_data
tends deter people using it?"similar previous, there well-known, industry agreed-upon pros , cons between two? kinds of things more comprehensive understanding of cs allude to. perhaps
get_headers
more secure/robust, , less susceptible ne'erdowells , inconsistencies server outputs? or maybeget_headers
known work in instancesstream_get_meta_data
produces , error?from can find,
stream_get_meta_data
have couple notes , warnings (... fopen), nothing awful can't worked around.
so long safe , consistent, incorporate project, seeing operation performed often, , cutting run time in half make substantial difference.
edit #1
i have since found few urls successful get_headers
throw warning stream_get_meta_data
php warning: fopen(http://www.alealimay.com/): failed open stream: http request failed! http/1.0 400 bad request php warning: fopen(http://www.thelovelist.net/): failed open stream: http request failed! http/1.0 400 bad request php warning: fopen(http://www.bleedingcool.com/): failed open stream: http request failed! http/1.1 403 forbidden
get_headers
returns 403 forbidden
status, though can paste urls browser , see working sites.
unsure this: both break-down of stream_get_meta_data
, , incomplete header get_headers
(should include redirects , final status_code = 200
functioning sites).
much thanks, if you've made far.
also, please comment if down-vote, might able improve question, , can learn future cases.
Comments
Post a Comment