get headers - PHP - Differences between `get_headers` and `stream_get_meta_data`? -
intro / disclaimer
decent chunks of outputs can largely ignored. still bit of reader, i'm trying thorough in analysis , questioning. if familiar stream_get_meta_data, fine skip "questions" @ end.
other in docs, having trouble finding out php's stream_get_meta_data. overall functionality not vastly different of php's get_headers, cannot life of me find comparisons between two, or pros/cons of former.
the setup
up until point, i've used php's get_headers verify validity of url. downside get_headers is notoriously slow. understandably, of latency directly due server hosting site of interest, maybe method overly robust, or else slowing down.
there plenty of links recommend using curl, claiming faster, i've run side-by-side, timed tests of both, , get_headers has come out on top, factor of 1.5 or 2.
i've yet see solutions using stream_get_meta_data, , stumbled upon first time today. i've exhausted google skills, without luck. but, in interest of optimizing scheme, ran tests.
the testing
comparisons between get_headers , stream_get_meta_data run using list of 106 current (i.e. live, valid, status=200) urls:
code block #1
// urls in format "http://www.domain.com" $urls = array('...', '...', '...'); // *106 urls // get_headers $start = microtime(true); foreach($urls $url) { try{ // unfortunately, get_headers not offer context argument stream_context_set_default(array('http' => array('method' => "head"))); $headers[] = @get_headers($url, 1); stream_context_set_default(array('http' => array('method' => "get"))); }catch(exception $e){ continue; } } $end1 = microtime(true) - $start; // stream_get_meta_data $cont = stream_context_create(array('http' => array('method' => "head"))); $start = microtime(true); foreach($urls $url) { try{ $fp = fopen($url, 'rb', false, $cont); if(!$fp) { continue; } $streams[] = stream_get_meta_data($fp); }catch(exception $e){ continue; } } $end2 = microtime(true) - $start; and results i'm getting stream_get_meta_data coming out on top, 90% of time, or more. times identical, more not stream_get_meta_data has shorter run-time
run times #1
"get_headers": 112.23 // seconds "stream_get": 42.61 // seconds with [stringified] outputs of 2 being like:
excerpt of comparison #1
url .. "http://www.wired.com/" get_headers | 0 ............................ "http/1.1 200 ok" | access-control-allow-origin .. "*" | cache-control ................ "stale-while-revalidate=86400, stale-while-error=86400" | content-type ................. "text/html; charset=utf-8" | link ......................... "; rel=\"https://api.w.org/\"" | server ....................... "apache" | via | | "1.1 varnish" | | "1.1 varnish" | | fastly-debug-state ........... "hit" | fastly-debug-digest .......... "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" | content-length ............... "135495" | accept-ranges ................ "bytes" | date ......................... "tue, 23 aug 2016 22:32:26 gmt" | age .......................... "701" | connection ................... "close" | x-served-by .................. "cache-jfk8149-jfk, cache-den6024-den" | x-cache ...................... "hit, hit" | x-cache-hits ................. "51, 1" | x-timer ...................... "s1471991546.459931,vs0,ve0" | vary ......................... "accept-encoding" stream_get | wrapper_data | | "http/1.1 200 ok" | | "access-control-allow-origin: *" | | "cache-control: stale-while-revalidate=86400, stale-while-error=86400" | | "content-type: text/html; charset=utf-8" | | "link: ; rel=\"https://api.w.org/\"" | | "server: apache" | | "via: 1.1 varnish" | | "fastly-debug-state: hit" | | "fastly-debug-digest: c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" | | "content-length: 135495" | | "accept-ranges: bytes" | | "date: tue, 23 aug 2016 22:32:26 gmt" | | "via: 1.1 varnish" | | "age: 701" | | "connection: close" | | "x-served-by: cache-jfk8149-jfk, cache-den6020-den" | | "x-cache: hit, hit" | | "x-cache-hits: 51, 1" | | "x-timer: s1471991546.614958,vs0,ve0" | | "vary: accept-encoding" | | wrapper_type ................. "http" | stream_type .................. "tcp_socket/ssl" | mode ......................... "rb" | unread_bytes ................. 0 | seekable ..................... false | uri .......................... "http://www.wired.com/" | timed_out .................... false | blocked ...................... true | eof .......................... false for part, same data, exception stream_get_meta_data doesn't offer way include keys wrapper_data, without parsing through manually.
easy enough...
code block #2.1/2.2
$wd = $meta[$url]['wrapper_data']; $warr = wrappertokeys($wd); where...
function wrappertokeys($wd) { $warr = array(); foreach($wd $row) { $pos = strpos($row, ': '); // *assuming* separated ": " (might colon, without space?) if($pos === false) { $warr[] = $row; }else { // $pos, $key , $value can done 1 preg_match $key = substr($row, 0, $pos); $value = substr($row, ($pos + 2)); // if key doesn't exist, assign value if(empty($warr[$key])) { $warr[$key] = $value; } // if key points array, add value array else if(is_array($warr[$key])) { $warr[$key][] = $value; } // if key points string, swap value array else { $warr[$key] = array($warr[$key], $value); } } } return $warr; } and output identical get_headers($url, 1):
excerpt of comparison #2
url .. "http://www.wired.com/" headers | 0 ............................ "http/1.1 200 ok" | access-control-allow-origin .. "*" | cache-control ................ "stale-while-revalidate=86400, stale-while-error=86400" | content-type ................. "text/html; charset=utf-8" | link ......................... "; rel=\"https://api.w.org/\"" | server ....................... "apache" | via | | "1.1 varnish" | | "1.1 varnish" | | fastly-debug-state ........... "hit" | fastly-debug-digest .......... "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" | content-length ............... "135495" | accept-ranges ................ "bytes" | date ......................... "tue, 23 aug 2016 22:35:29 gmt" | age .......................... "883" | connection ................... "close" | x-served-by .................. "cache-jfk8149-jfk, cache-den6027-den" | x-cache ...................... "hit, hit" | x-cache-hits ................. "51, 1" | x-timer ...................... "s1471991729.021214,vs0,ve0" | vary ......................... "accept-encoding" w-arr | 0 ............................ "http/1.1 200 ok" | access-control-allow-origin .. "*" | cache-control ................ "stale-while-revalidate=86400, stale-while-error=86400" | content-type ................. "text/html; charset=utf-8" | link ......................... "; rel=\"https://api.w.org/\"" | server ....................... "apache" | via | | "1.1 varnish" | | "1.1 varnish" | | fastly-debug-state ........... "hit" | fastly-debug-digest .......... "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" | content-length ............... "135495" | accept-ranges ................ "bytes" | date ......................... "tue, 23 aug 2016 22:35:29 gmt" | age .......................... "884" | connection ................... "close" | x-served-by .................. "cache-jfk8149-jfk, cache-den6021-den" | x-cache ...................... "hit, hit" | x-cache-hits ................. "51, 1" | x-timer ...................... "s1471991729.173641,vs0,ve0" | vary ......................... "accept-encoding" even sorting out keys, stream_get_meta_data champion:
sample run times #2
"get_headers": 99.51 // seconds "stream_get": 43.79 // seconds note: these tests being run on cheap shared server - hence large variations in testing times. being said, gap between 2 methods highly consistent between tests.
additional
for of understand c-code php, , feel might able gain insight it, function definitions can found at:
and
'stream_get_meta_data' (php git)
questions
how come
stream_get_meta_dataunderrepresented (in searches , available code snippets) comparedget_headers?the way i've worded leads opinions, intent more along lines of: "is there well-known , terrible
stream_get_meta_datatends deter people using it?"similar previous, there well-known, industry agreed-upon pros , cons between two? kinds of things more comprehensive understanding of cs allude to. perhaps
get_headersmore secure/robust, , less susceptible ne'erdowells , inconsistencies server outputs? or maybeget_headersknown work in instancesstream_get_meta_dataproduces , error?from can find,
stream_get_meta_datahave couple notes , warnings (... fopen), nothing awful can't worked around.
so long safe , consistent, incorporate project, seeing operation performed often, , cutting run time in half make substantial difference.
edit #1
i have since found few urls successful get_headers throw warning stream_get_meta_data
php warning: fopen(http://www.alealimay.com/): failed open stream: http request failed! http/1.0 400 bad request php warning: fopen(http://www.thelovelist.net/): failed open stream: http request failed! http/1.0 400 bad request php warning: fopen(http://www.bleedingcool.com/): failed open stream: http request failed! http/1.1 403 forbidden get_headers returns 403 forbidden status, though can paste urls browser , see working sites.
unsure this: both break-down of stream_get_meta_data, , incomplete header get_headers (should include redirects , final status_code = 200 functioning sites).
much thanks, if you've made far.
also, please comment if down-vote, might able improve question, , can learn future cases.
Comments
Post a Comment