Separate a certain piece of data from html with given start and endpoints

I am learning screen-scraping using C# and I was wondering

How can I separate certain pieces of gathered html,
I am using htmlAgilityPack and ScrapySharp library's for scraping thus with this code I can retrieve a html page:

WebPage PageResult = Browser.NavigateToPage(new Uri("localhost"));

Console.WriteLine(PageResult);

Of course I get back the whole source code with all the syntax and mishmash, but what If I wanted to only catch data between <h2></h2> tags and omit all else?

My very simple-minded pseudo code would be:

If result reads h2

Trim all behind

start writing out after 

If result reads /h2

stop writing

Trim anything that comes after

The main question I'm having is how do I feed In the rule that when I read h2 trim everything from before, write the data after that and if /h2 appears, stop and trim the end of the result?

edited Nov 25 '18 at 13:42

asked Nov 25 '18 at 13:31

Shayer

1328

add a comment |

I am learning screen-scraping using C# and I was wondering

How can I separate certain pieces of gathered html,
I am using htmlAgilityPack and ScrapySharp library's for scraping thus with this code I can retrieve a html page:

WebPage PageResult = Browser.NavigateToPage(new Uri("localhost"));

Console.WriteLine(PageResult);

Of course I get back the whole source code with all the syntax and mishmash, but what If I wanted to only catch data between <h2></h2> tags and omit all else?

My very simple-minded pseudo code would be:

If result reads h2

Trim all behind

start writing out after 

If result reads /h2

stop writing

Trim anything that comes after

The main question I'm having is how do I feed In the rule that when I read h2 trim everything from before, write the data after that and if /h2 appears, stop and trim the end of the result?

edited Nov 25 '18 at 13:42

asked Nov 25 '18 at 13:31

Shayer

1328

add a comment |

I am learning screen-scraping using C# and I was wondering

How can I separate certain pieces of gathered html,
I am using htmlAgilityPack and ScrapySharp library's for scraping thus with this code I can retrieve a html page:

WebPage PageResult = Browser.NavigateToPage(new Uri("localhost"));

Console.WriteLine(PageResult);

Of course I get back the whole source code with all the syntax and mishmash, but what If I wanted to only catch data between <h2></h2> tags and omit all else?

My very simple-minded pseudo code would be:

If result reads h2

Trim all behind

start writing out after 

If result reads /h2

stop writing

Trim anything that comes after

The main question I'm having is how do I feed In the rule that when I read h2 trim everything from before, write the data after that and if /h2 appears, stop and trim the end of the result?

edited Nov 25 '18 at 13:42

asked Nov 25 '18 at 13:31

Shayer

1328

I am learning screen-scraping using C# and I was wondering

How can I separate certain pieces of gathered html,
I am using htmlAgilityPack and ScrapySharp library's for scraping thus with this code I can retrieve a html page:

WebPage PageResult = Browser.NavigateToPage(new Uri("localhost"));

Console.WriteLine(PageResult);

Of course I get back the whole source code with all the syntax and mishmash, but what If I wanted to only catch data between <h2></h2> tags and omit all else?

My very simple-minded pseudo code would be:

If result reads h2

Trim all behind

start writing out after 

If result reads /h2

stop writing

Trim anything that comes after

The main question I'm having is how do I feed In the rule that when I read h2 trim everything from before, write the data after that and if /h2 appears, stop and trim the end of the result?

c# html

edited Nov 25 '18 at 13:42

asked Nov 25 '18 at 13:31

Shayer

1328

edited Nov 25 '18 at 13:42

asked Nov 25 '18 at 13:31

Shayer

1328

edited Nov 25 '18 at 13:42

asked Nov 25 '18 at 13:31

Shayer

1328

asked Nov 25 '18 at 13:31

Shayer

1328

asked Nov 25 '18 at 13:31

Shayer

1328

add a comment |

1 Answer
1

active

oldest

votes

There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,

This can be with the use of,
XElement
XmlElement
XDocument
etc.

The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,

var nodes = doc.DocumentNode.SelectNodes("//form//input");

answered Nov 25 '18 at 13:52

mahlatse

1,014518

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53467989%2fseparate-a-certain-piece-of-data-from-html-with-given-start-and-endpoints%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,

This can be with the use of,
XElement
XmlElement
XDocument
etc.

The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,

var nodes = doc.DocumentNode.SelectNodes("//form//input");

answered Nov 25 '18 at 13:52

mahlatse

1,014518

add a comment |

There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,

This can be with the use of,
XElement
XmlElement
XDocument
etc.

The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,

var nodes = doc.DocumentNode.SelectNodes("//form//input");

answered Nov 25 '18 at 13:52

mahlatse

1,014518

add a comment |

There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,

This can be with the use of,
XElement
XmlElement
XDocument
etc.

The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,

var nodes = doc.DocumentNode.SelectNodes("//form//input");

answered Nov 25 '18 at 13:52

mahlatse

1,014518

There are a few ways you can achieve this, one such would be to red the page as XML and parse the data you are looking for,

This can be with the use of,
XElement
XmlElement
XDocument
etc.

The second way, would be to use a third-party library like HtmlAgilityPack, this also supports XPath as well,

var nodes = doc.DocumentNode.SelectNodes("//form//input");

answered Nov 25 '18 at 13:52

mahlatse

1,014518

answered Nov 25 '18 at 13:52

mahlatse

1,014518

answered Nov 25 '18 at 13:52

mahlatse

1,014518

answered Nov 25 '18 at 13:52

mahlatse

1,014518

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk