Sửa lỗi Simple html dom file_get_html not working - is there any workaround? (ok)

https://stackoverflow.com/questions/18667441/simple-html-dom-file-get-html-not-working-is-there-any-workaround

Đã ok

<?php
include('simple_html_dom.php');
// Report all PHP errors (see changelog)
$base = 'https://mediamart.vn';

$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);

// Create a DOM object
$html_base = new simple_html_dom();
// Load HTML from a string
$html_base->load($str);

//get all category links
foreach($html_base->find('a') as $element) {
    echo "<pre>";
    print_r( $element->href );
    echo "</pre>";
}

$html_base->clear(); 
unset($html_base);

?>

98

<?php
// Report all PHP errors (see changelog)
error_reporting(E_ALL);

include('inc/simple_html_dom.php');

    //base url
    $base = 'https://play.google.com/store/apps';

    //home page HTML
    $html_base = file_get_html( $base );

    //get all category links
    foreach($html_base->find('a') as $element) {
        echo "<pre>";
        print_r( $element->href );
        echo "</pre>";
    }

    $html_base->clear(); 
    unset($html_base);

?>

I have the above code and I'm trying to get certain elements of the Play Store page but it isn't returning anything. Is it possible that certain PHP functions might be disabled on the server to stop that?

The above code works perfectly on other sites.

Is there any workaround?php html-parsing file-get-contents simple-html-domshare improve this question follow edited Mar 31 '14 at 23:45JasonMArcher11.2k1111 gold badges5151 silver badges5050 bronze badgesasked Sep 6 '13 at 22:20Altin1,41522 gold badges2020 silver badges4242 bronze badges

add a comment

4 Answers

ActiveOldestVotes35

As I said, your example is working fine for me... But try this way using curl instead:

//base url
$base = 'https://play.google.com/store/apps';

$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);

// Create a DOM object
$html_base = new simple_html_dom();
// Load HTML from a string
$html_base->load($str);

//get all category links
foreach($html_base->find('a') as $element) {
    echo "<pre>";
    print_r( $element->href );
    echo "</pre>";
}

$html_base->clear(); 
unset($html_base);

It gets all the links as expected:

enter image description here

And make sure you have php_openssl and php_curl installed...share improve this answer follow answered Sep 7 '13 at 0:45Enissay4,64733 gold badges2424 silver badges4747 bronze badges

  • 1wow thank you, as you said, I just needed to activate the "php_openssl" extension and it works now :) I'm using WAMP Server on windows and it was inactive by default. Thanks man! – Altin Sep 7 '13 at 3:19

add a comment3

remove the semicolon from php.ini and restart Apache server to enable php module configuration

; Windows Extensions
...
;extension=php_openssl.dll
...

share improve this answer follow answered Aug 23 '16 at 2:05Chitsai Yeh4611 bronze badgeadd a comment2

You must set "allow_url_fopen" as TRUE in "php.ini" to allow accessing files via HTTP or FTP. Some hosting venders disable PHP's "allow_url_fopen" flag for security issues.share improve this answer follow answered Jan 7 '15 at 23:01shahil72377 silver badges1717 bronze badgesadd a comment1

$post = curl_init(); 
curl_setopt($post, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($post, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($post, CURLOPT_HEADER, 0);
curl_setopt($post,CURLOPT_RETURNTRANSFER, true);
curl_setopt($post,CURLOPT_URL,$website);
curl_setopt($post,CURLOPT_POST,1);
curl_setopt($post,CURLOPT_POSTFIELDS,"regno=$Number");
curl_setopt($post, CURLOPT_FOLLOWLOCATION, True);
curl_getinfo($post, CURLINFO_HTTP_CODE);
$curlresponse = curl_exec($post);
curl_close($post);  
$dom = new DOMDocument();
$dom->loadHTML($curlresponse);

DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseStartTag: misplaced THIS IS URL : http://www.annauniv.edu/cgi-bin/result/cgrade.pl?regno=11210104001

Last updated

Was this helpful?