Separate a certain piece of data from html with given start and endpoints
I am learning screen-scraping using C# and I was wondering
How can I separate certain pieces of gathered html,
I am using htmlAgilityPack and ScrapySharp library's for scraping thus with this code I can retrieve a html page:
WebPage PageResult = Browser.NavigateToPage(new Uri("localhost"));
Console.WriteLine(PageResult);
Of course I get back the whole source code with all the syntax and mishmash, but what If I wanted to only catch data between <h2></h2>
tags and omit all else?
My very simple-minded pseudo code would be:
If result reads h2
Trim all behind
start writing out after
If result reads /h2
stop writing
Trim anything that comes after
The main question I'm having is how do I feed
In the rule that when I read h2
trim everything from before, write the data after that and if /h2
appears, stop and trim the end of the result?
c# html
add a comment |
I am learning screen-scraping using C# and I was wondering
How can I separate certain pieces of gathered html,
I am using htmlAgilityPack and ScrapySharp library's for scraping thus with this code I can retrieve a html page:
WebPage PageResult = Browser.NavigateToPage(new Uri("localhost"));
Console.WriteLine(PageResult);
Of course I get back the whole source code with all the syntax and mishmash, but what If I wanted to only catch data between <h2></h2>
tags and omit all else?
My very simple-minded pseudo code would be:
If result reads h2
Trim all behind
start writing out after
If result reads /h2
stop writing
Trim anything that comes after
The main question I'm having is how do I feed
In the rule that when I read h2
trim everything from before, write the data after that and if /h2
appears, stop and trim the end of the result?
c# html
add a comment |
I am learning screen-scraping using C# and I was wondering
How can I separate certain pieces of gathered html,
I am using htmlAgilityPack and ScrapySharp library's for scraping thus with this code I can retrieve a html page:
WebPage PageResult = Browser.NavigateToPage(new Uri("localhost"));
Console.WriteLine(PageResult);
Of course I get back the whole source code with all the syntax and mishmash, but what If I wanted to only catch data between <h2></h2>
tags and omit all else?
My very simple-minded pseudo code would be:
If result reads h2
Trim all behind
start writing out after
If result reads /h2
stop writing
Trim anything that comes after
The main question I'm having is how do I feed
In the rule that when I read h2
trim everything from before, write the data after that and if /h2
appears, stop and trim the end of the result?
c# html
I am learning screen-scraping using C# and I was wondering
How can I separate certain pieces of gathered html,
I am using htmlAgilityPack and ScrapySharp library's for scraping thus with this code I can retrieve a html page:
WebPage PageResult = Browser.NavigateToPage(new Uri("localhost"));
Console.WriteLine(PageResult);
Of course I get back the whole source code with all the syntax and mishmash, but what If I wanted to only catch data between <h2></h2>
tags and omit all else?
My very simple-minded pseudo code would be:
If result reads h2
Trim all behind
start writing out after
If result reads /h2
stop writing
Trim anything that comes after
The main question I'm having is how do I feed
In the rule that when I read h2
trim everything from before, write the data after that and if /h2
appears, stop and trim the end of the result?
c# html
c# html
edited Nov 25 '18 at 13:42
Shayer
asked Nov 25 '18 at 13:31
ShayerShayer
1328
1328
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,
This can be with the use of,
XElement
XmlElement
XDocument
etc.
The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,
var nodes = doc.DocumentNode.SelectNodes("//form//input");
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467989%2fseparate-a-certain-piece-of-data-from-html-with-given-start-and-endpoints%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,
This can be with the use of,
XElement
XmlElement
XDocument
etc.
The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,
var nodes = doc.DocumentNode.SelectNodes("//form//input");
add a comment |
There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,
This can be with the use of,
XElement
XmlElement
XDocument
etc.
The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,
var nodes = doc.DocumentNode.SelectNodes("//form//input");
add a comment |
There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,
This can be with the use of,
XElement
XmlElement
XDocument
etc.
The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,
var nodes = doc.DocumentNode.SelectNodes("//form//input");
There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,
This can be with the use of,
XElement
XmlElement
XDocument
etc.
The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,
var nodes = doc.DocumentNode.SelectNodes("//form//input");
answered Nov 25 '18 at 13:52
mahlatsemahlatse
1,014518
1,014518
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467989%2fseparate-a-certain-piece-of-data-from-html-with-given-start-and-endpoints%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown