python - finding elements by attribute with lxml -
i need parse xml file extract data. need elements attributes, here's example of document:
<root> <articles> <article type="news"> <content>some text</content> </article> <article type="info"> <content>some text</content> </article> <article type="news"> <content>some text</content> </article> </articles> </root>
here article type "news". what's efficient , elegant way lxml?
i tried find method it's not nice:
from lxml import etree f = etree.parse("myfile") root = f.getroot() articles = root.getchildren()[0] article_list = articles.findall('article') article in article_list: if "type" in article.keys(): if article.attrib['type'] == 'news': content = article.find('content') content = content.text
you can use xpath, e.g. root.xpath("//article[@type='news']")
this xpath expression return list of <article/>
elements "type" attributes value "news". can iterate on want, or pass wherever.
to text content, can extend xpath so:
root = etree.fromstring(""" <root> <articles> <article type="news"> <content>some text</content> </article> <article type="info"> <content>some text</content> </article> <article type="news"> <content>some text</content> </article> </articles> </root> """) print root.xpath("//article[@type='news']/content/text()")
and output ['some text', 'some text']
. or if wanted content elements, "//article[@type='news']/content"
-- , on.
Comments
Post a Comment