Can I automate browsing a dynamic website without opening a browser?












0















I got into automating tasks on the web using python.
I have tried requests/urllib3/requests-html but they don't get me the right elements, because they get only the html (not the updated version with javascript).
Some recommended Selenium, but it opens a browser with the webdriver.
I need a way to get elements after they get updated, and maybe after they get updated for a second time.
The reason I don't want it to open a browser is I'm running my script on a hosting-scripts service.










share|improve this question

























  • Please can you share an Minimal, Complete, and Verifiable example. Where is some code and a test URL/HTML ?

    – QHarr
    Nov 25 '18 at 17:32
















0















I got into automating tasks on the web using python.
I have tried requests/urllib3/requests-html but they don't get me the right elements, because they get only the html (not the updated version with javascript).
Some recommended Selenium, but it opens a browser with the webdriver.
I need a way to get elements after they get updated, and maybe after they get updated for a second time.
The reason I don't want it to open a browser is I'm running my script on a hosting-scripts service.










share|improve this question

























  • Please can you share an Minimal, Complete, and Verifiable example. Where is some code and a test URL/HTML ?

    – QHarr
    Nov 25 '18 at 17:32














0












0








0








I got into automating tasks on the web using python.
I have tried requests/urllib3/requests-html but they don't get me the right elements, because they get only the html (not the updated version with javascript).
Some recommended Selenium, but it opens a browser with the webdriver.
I need a way to get elements after they get updated, and maybe after they get updated for a second time.
The reason I don't want it to open a browser is I'm running my script on a hosting-scripts service.










share|improve this question
















I got into automating tasks on the web using python.
I have tried requests/urllib3/requests-html but they don't get me the right elements, because they get only the html (not the updated version with javascript).
Some recommended Selenium, but it opens a browser with the webdriver.
I need a way to get elements after they get updated, and maybe after they get updated for a second time.
The reason I don't want it to open a browser is I'm running my script on a hosting-scripts service.







python selenium web-scraping






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 '18 at 16:22









Al Imran

703417




703417










asked Nov 25 '18 at 16:15









FunrisFunris

268




268













  • Please can you share an Minimal, Complete, and Verifiable example. Where is some code and a test URL/HTML ?

    – QHarr
    Nov 25 '18 at 17:32



















  • Please can you share an Minimal, Complete, and Verifiable example. Where is some code and a test URL/HTML ?

    – QHarr
    Nov 25 '18 at 17:32

















Please can you share an Minimal, Complete, and Verifiable example. Where is some code and a test URL/HTML ?

– QHarr
Nov 25 '18 at 17:32





Please can you share an Minimal, Complete, and Verifiable example. Where is some code and a test URL/HTML ?

– QHarr
Nov 25 '18 at 17:32












2 Answers
2






active

oldest

votes


















2














I would recommend that you look into the --headless option in webdriver, but that will probably not work for you, since this still requires the browser installed so webdriver can make use of the browsers rendering engine ("headless" means it does not start the UI). Since your hosting service will probably not have the browser executables installed this will not work.



Without a rendering engine you will not get the rendered (and JS-enhanced) web page, that simply does not work in pure python.



On option would be a service like saucelabs (I am not affiliated, but I am a happy user) who run browsers on their infrastructure and allow you to control them via their API. So you can run selenium scripts that get the HTML/JS content via RemoteWebDriver and process the results on your own server.






share|improve this answer

































    0














    Here is my Solution to your problem.



    Beautiful Soup doesn't mimic a client. Javascript is code that runs on the client. With Python, we simply make a request to the server, and get the server's response, along of course with the javascript, but it's the browser that reads and runs that javascript. Thus, we need to do that. There are many ways to do this. If you're on Mac or Linux, you can setup dryscrape... or we can just do basically what dryscrape does in PyQt4.



        import sys
    from PyQt4.QtGui import QApplication
    from PyQt4.QtCore import QUrl
    from PyQt4.QtWebKit import QWebPage
    import bs4 as bs
    import urllib.request

    class Client(QWebPage):

    def __init__(self, url):
    self.app = QApplication(sys.argv)
    QWebPage.__init__(self)
    self.loadFinished.connect(self.on_page_load)
    self.mainFrame().load(QUrl(url))
    self.app.exec_()

    def on_page_load(self):
    self.app.quit()

    url = 'https://pythonprogramming.net/parsememcparseface/'
    client_response = Client(url)
    source = client_response.mainFrame().toHtml()
    soup = bs.BeautifulSoup(source, 'lxml')
    js_test = soup.find('p', class_='jstest')
    print(js_test.text)


    Just in case you wanted to make use of dryscrape:



        import dryscrape

    sess = dryscrape.Session()
    sess.visit('https://pythonprogramming.net/parsememcparseface/')
    source = sess.body()

    soup = bs.BeautifulSoup(source,'lxml')
    js_test = soup.find('p', class_='jstest')
    print(js_test.text)





    share|improve this answer


























    • I already tried PyQt4, but there's always No module named PyQt4 error.

      – Funris
      Nov 26 '18 at 6:29











    • pip install PyQt4 Works like a charm for me.

      – Terry Craddock
      Nov 26 '18 at 16:09











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53469407%2fcan-i-automate-browsing-a-dynamic-website-without-opening-a-browser%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    I would recommend that you look into the --headless option in webdriver, but that will probably not work for you, since this still requires the browser installed so webdriver can make use of the browsers rendering engine ("headless" means it does not start the UI). Since your hosting service will probably not have the browser executables installed this will not work.



    Without a rendering engine you will not get the rendered (and JS-enhanced) web page, that simply does not work in pure python.



    On option would be a service like saucelabs (I am not affiliated, but I am a happy user) who run browsers on their infrastructure and allow you to control them via their API. So you can run selenium scripts that get the HTML/JS content via RemoteWebDriver and process the results on your own server.






    share|improve this answer






























      2














      I would recommend that you look into the --headless option in webdriver, but that will probably not work for you, since this still requires the browser installed so webdriver can make use of the browsers rendering engine ("headless" means it does not start the UI). Since your hosting service will probably not have the browser executables installed this will not work.



      Without a rendering engine you will not get the rendered (and JS-enhanced) web page, that simply does not work in pure python.



      On option would be a service like saucelabs (I am not affiliated, but I am a happy user) who run browsers on their infrastructure and allow you to control them via their API. So you can run selenium scripts that get the HTML/JS content via RemoteWebDriver and process the results on your own server.






      share|improve this answer




























        2












        2








        2







        I would recommend that you look into the --headless option in webdriver, but that will probably not work for you, since this still requires the browser installed so webdriver can make use of the browsers rendering engine ("headless" means it does not start the UI). Since your hosting service will probably not have the browser executables installed this will not work.



        Without a rendering engine you will not get the rendered (and JS-enhanced) web page, that simply does not work in pure python.



        On option would be a service like saucelabs (I am not affiliated, but I am a happy user) who run browsers on their infrastructure and allow you to control them via their API. So you can run selenium scripts that get the HTML/JS content via RemoteWebDriver and process the results on your own server.






        share|improve this answer















        I would recommend that you look into the --headless option in webdriver, but that will probably not work for you, since this still requires the browser installed so webdriver can make use of the browsers rendering engine ("headless" means it does not start the UI). Since your hosting service will probably not have the browser executables installed this will not work.



        Without a rendering engine you will not get the rendered (and JS-enhanced) web page, that simply does not work in pure python.



        On option would be a service like saucelabs (I am not affiliated, but I am a happy user) who run browsers on their infrastructure and allow you to control them via their API. So you can run selenium scripts that get the HTML/JS content via RemoteWebDriver and process the results on your own server.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 25 '18 at 17:53

























        answered Nov 25 '18 at 16:23









        Eike PierstorffEike Pierstorff

        25k32446




        25k32446

























            0














            Here is my Solution to your problem.



            Beautiful Soup doesn't mimic a client. Javascript is code that runs on the client. With Python, we simply make a request to the server, and get the server's response, along of course with the javascript, but it's the browser that reads and runs that javascript. Thus, we need to do that. There are many ways to do this. If you're on Mac or Linux, you can setup dryscrape... or we can just do basically what dryscrape does in PyQt4.



                import sys
            from PyQt4.QtGui import QApplication
            from PyQt4.QtCore import QUrl
            from PyQt4.QtWebKit import QWebPage
            import bs4 as bs
            import urllib.request

            class Client(QWebPage):

            def __init__(self, url):
            self.app = QApplication(sys.argv)
            QWebPage.__init__(self)
            self.loadFinished.connect(self.on_page_load)
            self.mainFrame().load(QUrl(url))
            self.app.exec_()

            def on_page_load(self):
            self.app.quit()

            url = 'https://pythonprogramming.net/parsememcparseface/'
            client_response = Client(url)
            source = client_response.mainFrame().toHtml()
            soup = bs.BeautifulSoup(source, 'lxml')
            js_test = soup.find('p', class_='jstest')
            print(js_test.text)


            Just in case you wanted to make use of dryscrape:



                import dryscrape

            sess = dryscrape.Session()
            sess.visit('https://pythonprogramming.net/parsememcparseface/')
            source = sess.body()

            soup = bs.BeautifulSoup(source,'lxml')
            js_test = soup.find('p', class_='jstest')
            print(js_test.text)





            share|improve this answer


























            • I already tried PyQt4, but there's always No module named PyQt4 error.

              – Funris
              Nov 26 '18 at 6:29











            • pip install PyQt4 Works like a charm for me.

              – Terry Craddock
              Nov 26 '18 at 16:09
















            0














            Here is my Solution to your problem.



            Beautiful Soup doesn't mimic a client. Javascript is code that runs on the client. With Python, we simply make a request to the server, and get the server's response, along of course with the javascript, but it's the browser that reads and runs that javascript. Thus, we need to do that. There are many ways to do this. If you're on Mac or Linux, you can setup dryscrape... or we can just do basically what dryscrape does in PyQt4.



                import sys
            from PyQt4.QtGui import QApplication
            from PyQt4.QtCore import QUrl
            from PyQt4.QtWebKit import QWebPage
            import bs4 as bs
            import urllib.request

            class Client(QWebPage):

            def __init__(self, url):
            self.app = QApplication(sys.argv)
            QWebPage.__init__(self)
            self.loadFinished.connect(self.on_page_load)
            self.mainFrame().load(QUrl(url))
            self.app.exec_()

            def on_page_load(self):
            self.app.quit()

            url = 'https://pythonprogramming.net/parsememcparseface/'
            client_response = Client(url)
            source = client_response.mainFrame().toHtml()
            soup = bs.BeautifulSoup(source, 'lxml')
            js_test = soup.find('p', class_='jstest')
            print(js_test.text)


            Just in case you wanted to make use of dryscrape:



                import dryscrape

            sess = dryscrape.Session()
            sess.visit('https://pythonprogramming.net/parsememcparseface/')
            source = sess.body()

            soup = bs.BeautifulSoup(source,'lxml')
            js_test = soup.find('p', class_='jstest')
            print(js_test.text)





            share|improve this answer


























            • I already tried PyQt4, but there's always No module named PyQt4 error.

              – Funris
              Nov 26 '18 at 6:29











            • pip install PyQt4 Works like a charm for me.

              – Terry Craddock
              Nov 26 '18 at 16:09














            0












            0








            0







            Here is my Solution to your problem.



            Beautiful Soup doesn't mimic a client. Javascript is code that runs on the client. With Python, we simply make a request to the server, and get the server's response, along of course with the javascript, but it's the browser that reads and runs that javascript. Thus, we need to do that. There are many ways to do this. If you're on Mac or Linux, you can setup dryscrape... or we can just do basically what dryscrape does in PyQt4.



                import sys
            from PyQt4.QtGui import QApplication
            from PyQt4.QtCore import QUrl
            from PyQt4.QtWebKit import QWebPage
            import bs4 as bs
            import urllib.request

            class Client(QWebPage):

            def __init__(self, url):
            self.app = QApplication(sys.argv)
            QWebPage.__init__(self)
            self.loadFinished.connect(self.on_page_load)
            self.mainFrame().load(QUrl(url))
            self.app.exec_()

            def on_page_load(self):
            self.app.quit()

            url = 'https://pythonprogramming.net/parsememcparseface/'
            client_response = Client(url)
            source = client_response.mainFrame().toHtml()
            soup = bs.BeautifulSoup(source, 'lxml')
            js_test = soup.find('p', class_='jstest')
            print(js_test.text)


            Just in case you wanted to make use of dryscrape:



                import dryscrape

            sess = dryscrape.Session()
            sess.visit('https://pythonprogramming.net/parsememcparseface/')
            source = sess.body()

            soup = bs.BeautifulSoup(source,'lxml')
            js_test = soup.find('p', class_='jstest')
            print(js_test.text)





            share|improve this answer















            Here is my Solution to your problem.



            Beautiful Soup doesn't mimic a client. Javascript is code that runs on the client. With Python, we simply make a request to the server, and get the server's response, along of course with the javascript, but it's the browser that reads and runs that javascript. Thus, we need to do that. There are many ways to do this. If you're on Mac or Linux, you can setup dryscrape... or we can just do basically what dryscrape does in PyQt4.



                import sys
            from PyQt4.QtGui import QApplication
            from PyQt4.QtCore import QUrl
            from PyQt4.QtWebKit import QWebPage
            import bs4 as bs
            import urllib.request

            class Client(QWebPage):

            def __init__(self, url):
            self.app = QApplication(sys.argv)
            QWebPage.__init__(self)
            self.loadFinished.connect(self.on_page_load)
            self.mainFrame().load(QUrl(url))
            self.app.exec_()

            def on_page_load(self):
            self.app.quit()

            url = 'https://pythonprogramming.net/parsememcparseface/'
            client_response = Client(url)
            source = client_response.mainFrame().toHtml()
            soup = bs.BeautifulSoup(source, 'lxml')
            js_test = soup.find('p', class_='jstest')
            print(js_test.text)


            Just in case you wanted to make use of dryscrape:



                import dryscrape

            sess = dryscrape.Session()
            sess.visit('https://pythonprogramming.net/parsememcparseface/')
            source = sess.body()

            soup = bs.BeautifulSoup(source,'lxml')
            js_test = soup.find('p', class_='jstest')
            print(js_test.text)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 25 '18 at 16:37

























            answered Nov 25 '18 at 16:31









            Terry CraddockTerry Craddock

            114




            114













            • I already tried PyQt4, but there's always No module named PyQt4 error.

              – Funris
              Nov 26 '18 at 6:29











            • pip install PyQt4 Works like a charm for me.

              – Terry Craddock
              Nov 26 '18 at 16:09



















            • I already tried PyQt4, but there's always No module named PyQt4 error.

              – Funris
              Nov 26 '18 at 6:29











            • pip install PyQt4 Works like a charm for me.

              – Terry Craddock
              Nov 26 '18 at 16:09

















            I already tried PyQt4, but there's always No module named PyQt4 error.

            – Funris
            Nov 26 '18 at 6:29





            I already tried PyQt4, but there's always No module named PyQt4 error.

            – Funris
            Nov 26 '18 at 6:29













            pip install PyQt4 Works like a charm for me.

            – Terry Craddock
            Nov 26 '18 at 16:09





            pip install PyQt4 Works like a charm for me.

            – Terry Craddock
            Nov 26 '18 at 16:09


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53469407%2fcan-i-automate-browsing-a-dynamic-website-without-opening-a-browser%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Feedback on college project

            Futebolista

            Albești (Vaslui)