python - finding elements by attribute with lxml -

- June 15, 2015

i need parse xml file extract data. need elements attributes, here's example of document:

<root>     <articles>         <article type="news">              <content>some text</content>         </article>         <article type="info">              <content>some text</content>         </article>         <article type="news">              <content>some text</content>         </article>     </articles> </root>

here article type "news". what's efficient , elegant way lxml?

i tried find method it's not nice:

from lxml import etree f = etree.parse("myfile") root = f.getroot() articles = root.getchildren()[0] article_list = articles.findall('article') article in article_list:     if "type" in article.keys():         if article.attrib['type'] == 'news':             content = article.find('content')             content = content.text

you can use xpath, e.g. root.xpath("//article[@type='news']")

this xpath expression return list of <article/> elements "type" attributes value "news". can iterate on want, or pass wherever.

to text content, can extend xpath so:

root = etree.fromstring(""" <root>     <articles>         <article type="news">              <content>some text</content>         </article>         <article type="info">              <content>some text</content>         </article>         <article type="news">              <content>some text</content>         </article>     </articles> </root> """)  print root.xpath("//article[@type='news']/content/text()")

and output ['some text', 'some text']. or if wanted content elements, "//article[@type='news']/content" -- , on.

Search This Blog

Return

python - finding elements by attribute with lxml -

Comments

Post a Comment

Popular posts from this blog

Show multiple (2,3,4,…) images in the same window in OpenCV -

c# - Is it possible to remove an existing registration from Autofac container builder? -

asp.net - RadAsyncUpload in code behind, how to? -