get headers - PHP - Differences between `get_headers` and `stream_get_meta

intro / disclaimer

decent chunks of outputs can largely ignored. still bit of reader, i'm trying thorough in analysis , questioning. if familiar stream_get_meta_data, fine skip "questions" @ end.

other in docs, having trouble finding out php's stream_get_meta_data. overall functionality not vastly different of php's get_headers, cannot life of me find comparisons between two, or pros/cons of former.

the setup

up until point, i've used php's get_headers verify validity of url. downside get_headers is notoriously slow. understandably, of latency directly due server hosting site of interest, maybe method overly robust, or else slowing down.

there plenty of links recommend using curl, claiming faster, i've run side-by-side, timed tests of both, , get_headers has come out on top, factor of 1.5 or 2.

i've yet see solutions using stream_get_meta_data, , stumbled upon first time today. i've exhausted google skills, without luck. but, in interest of optimizing scheme, ran tests.

the testing

comparisons between get_headers , stream_get_meta_data run using list of 106 current (i.e. live, valid, status=200) urls:

code block #1

// urls in format "http://www.domain.com" $urls = array('...', '...', '...'); // *106 urls  // get_headers $start = microtime(true); foreach($urls $url) {     try{         // unfortunately, get_headers not offer context argument         stream_context_set_default(array('http' => array('method' => "head")));         $headers[] = @get_headers($url, 1);          stream_context_set_default(array('http' => array('method' => "get")));      }catch(exception $e){         continue;     } } $end1 = microtime(true) - $start;  // stream_get_meta_data $cont = stream_context_create(array('http' => array('method' => "head"))); $start = microtime(true); foreach($urls $url) {     try{         $fp = fopen($url, 'rb', false, $cont);         if(!$fp) {             continue;         }         $streams[] = stream_get_meta_data($fp);      }catch(exception $e){         continue;     } } $end2 = microtime(true) - $start;

and results i'm getting stream_get_meta_data coming out on top, 90% of time, or more. times identical, more not stream_get_meta_data has shorter run-time

run times #1

"get_headers": 112.23 // seconds "stream_get":  42.61 // seconds

with [stringified] outputs of 2 being like:

excerpt of comparison #1

url  ..  "http://www.wired.com/"  get_headers |    0  ............................  "http/1.1 200 ok" |    access-control-allow-origin  ..  "*" |    cache-control  ................  "stale-while-revalidate=86400, stale-while-error=86400" |    content-type  .................  "text/html; charset=utf-8" |    link  .........................  "; rel=\"https://api.w.org/\"" |    server  .......................  "apache" |    via |    |    "1.1 varnish" |    |    "1.1 varnish" |     |    fastly-debug-state  ...........  "hit" |    fastly-debug-digest  ..........  "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" |    content-length  ...............  "135495" |    accept-ranges  ................  "bytes" |    date  .........................  "tue, 23 aug 2016 22:32:26 gmt" |    age  ..........................  "701" |    connection  ...................  "close" |    x-served-by  ..................  "cache-jfk8149-jfk, cache-den6024-den" |    x-cache  ......................  "hit, hit" |    x-cache-hits  .................  "51, 1" |    x-timer  ......................  "s1471991546.459931,vs0,ve0" |    vary  .........................  "accept-encoding"  stream_get |    wrapper_data |    |    "http/1.1 200 ok" |    |    "access-control-allow-origin: *" |    |    "cache-control: stale-while-revalidate=86400, stale-while-error=86400" |    |    "content-type: text/html; charset=utf-8" |    |    "link: ; rel=\"https://api.w.org/\"" |    |    "server: apache" |    |    "via: 1.1 varnish" |    |    "fastly-debug-state: hit" |    |    "fastly-debug-digest: c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" |    |    "content-length: 135495" |    |    "accept-ranges: bytes" |    |    "date: tue, 23 aug 2016 22:32:26 gmt" |    |    "via: 1.1 varnish" |    |    "age: 701" |    |    "connection: close" |    |    "x-served-by: cache-jfk8149-jfk, cache-den6020-den" |    |    "x-cache: hit, hit" |    |    "x-cache-hits: 51, 1" |    |    "x-timer: s1471991546.614958,vs0,ve0" |    |    "vary: accept-encoding" |     |    wrapper_type  .................  "http" |    stream_type  ..................  "tcp_socket/ssl" |    mode  .........................  "rb" |    unread_bytes  .................  0 |    seekable  .....................  false |    uri  ..........................  "http://www.wired.com/" |    timed_out  ....................  false |    blocked  ......................  true |    eof  ..........................  false

for part, same data, exception stream_get_meta_data doesn't offer way include keys wrapper_data, without parsing through manually.

easy enough...

code block #2.1/2.2

$wd = $meta[$url]['wrapper_data']; $warr = wrappertokeys($wd);

where...

function wrappertokeys($wd) {     $warr = array();     foreach($wd $row) {         $pos = strpos($row, ': '); // *assuming* separated ": " (might colon, without space?)          if($pos === false) {             $warr[] = $row;         }else {             // $pos, $key , $value can done 1 preg_match             $key = substr($row, 0, $pos);             $value = substr($row, ($pos + 2));              // if key doesn't exist, assign value             if(empty($warr[$key])) {                             $warr[$key] = $value;             }              // if key points array, add value array             else if(is_array($warr[$key])) {                     $warr[$key][] = $value;             }              // if key points string, swap value array             else {                                           $warr[$key] = array($warr[$key], $value);             }         }     }      return $warr; }

and output identical get_headers($url, 1):

excerpt of comparison #2

url  ..  "http://www.wired.com/"  headers |    0  ............................  "http/1.1 200 ok" |    access-control-allow-origin  ..  "*" |    cache-control  ................  "stale-while-revalidate=86400, stale-while-error=86400" |    content-type  .................  "text/html; charset=utf-8" |    link  .........................  "; rel=\"https://api.w.org/\"" |    server  .......................  "apache" |    via |    |    "1.1 varnish" |    |    "1.1 varnish" |     |    fastly-debug-state  ...........  "hit" |    fastly-debug-digest  ..........  "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" |    content-length  ...............  "135495" |    accept-ranges  ................  "bytes" |    date  .........................  "tue, 23 aug 2016 22:35:29 gmt" |    age  ..........................  "883" |    connection  ...................  "close" |    x-served-by  ..................  "cache-jfk8149-jfk, cache-den6027-den" |    x-cache  ......................  "hit, hit" |    x-cache-hits  .................  "51, 1" |    x-timer  ......................  "s1471991729.021214,vs0,ve0" |    vary  .........................  "accept-encoding"  w-arr |    0  ............................  "http/1.1 200 ok" |    access-control-allow-origin  ..  "*" |    cache-control  ................  "stale-while-revalidate=86400, stale-while-error=86400" |    content-type  .................  "text/html; charset=utf-8" |    link  .........................  "; rel=\"https://api.w.org/\"" |    server  .......................  "apache" |    via |    |    "1.1 varnish" |    |    "1.1 varnish" |     |    fastly-debug-state  ...........  "hit" |    fastly-debug-digest  ..........  "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695" |    content-length  ...............  "135495" |    accept-ranges  ................  "bytes" |    date  .........................  "tue, 23 aug 2016 22:35:29 gmt" |    age  ..........................  "884" |    connection  ...................  "close" |    x-served-by  ..................  "cache-jfk8149-jfk, cache-den6021-den" |    x-cache  ......................  "hit, hit" |    x-cache-hits  .................  "51, 1" |    x-timer  ......................  "s1471991729.173641,vs0,ve0" |    vary  .........................  "accept-encoding"

even sorting out keys, stream_get_meta_data champion:

sample run times #2

"get_headers": 99.51 // seconds "stream_get": 43.79 // seconds

note: these tests being run on cheap shared server - hence large variations in testing times. being said, gap between 2 methods highly consistent between tests.

additional

for of understand c-code php, , feel might able gain insight it, function definitions can found at:

'get_headers' (php git)

and

'stream_get_meta_data' (php git)

questions

how come stream_get_meta_data underrepresented (in searches , available code snippets) compared get_headers?

the way i've worded leads opinions, intent more along lines of: "is there well-known , terrible stream_get_meta_data tends deter people using it?"
similar previous, there well-known, industry agreed-upon pros , cons between two? kinds of things more comprehensive understanding of cs allude to. perhaps get_headers more secure/robust, , less susceptible ne'erdowells , inconsistencies server outputs? or maybe get_headers known work in instances stream_get_meta_data produces , error?

from can find, stream_get_meta_data have couple notes , warnings (... fopen), nothing awful can't worked around.

so long safe , consistent, incorporate project, seeing operation performed often, , cutting run time in half make substantial difference.

edit #1

i have since found few urls successful get_headers throw warning stream_get_meta_data

php warning:  fopen(http://www.alealimay.com/): failed open stream: http request failed! http/1.0 400 bad request  php warning:  fopen(http://www.thelovelist.net/): failed open stream: http request failed! http/1.0 400 bad request  php warning:  fopen(http://www.bleedingcool.com/): failed open stream: http request failed! http/1.1 403 forbidden

get_headers returns 403 forbidden status, though can paste urls browser , see working sites.

unsure this: both break-down of stream_get_meta_data, , incomplete header get_headers (should include redirects , final status_code = 200 functioning sites).

much thanks, if you've made far.

also, please comment if down-vote, might able improve question, , can learn future cases.

Search This Blog

celery