python - finding elements by attribute with lxml -


i need parse xml file extract data. need elements attributes, here's example of document:

<root>     <articles>         <article type="news">              <content>some text</content>         </article>         <article type="info">              <content>some text</content>         </article>         <article type="news">              <content>some text</content>         </article>     </articles> </root> 

here article type "news". what's efficient , elegant way lxml?

i tried find method it's not nice:

from lxml import etree f = etree.parse("myfile") root = f.getroot() articles = root.getchildren()[0] article_list = articles.findall('article') article in article_list:     if "type" in article.keys():         if article.attrib['type'] == 'news':             content = article.find('content')             content = content.text 

you can use xpath, e.g. root.xpath("//article[@type='news']")

this xpath expression return list of <article/> elements "type" attributes value "news". can iterate on want, or pass wherever.

to text content, can extend xpath so:

root = etree.fromstring(""" <root>     <articles>         <article type="news">              <content>some text</content>         </article>         <article type="info">              <content>some text</content>         </article>         <article type="news">              <content>some text</content>         </article>     </articles> </root> """)  print root.xpath("//article[@type='news']/content/text()") 

and output ['some text', 'some text']. or if wanted content elements, "//article[@type='news']/content" -- , on.


Comments

Popular posts from this blog

linux - Mailx and Gmail nss config dir -

c# - Is it possible to remove an existing registration from Autofac container builder? -

php - Mysql PK and FK char(36) vs int(10) -