Parsing XML files with Python (xml.etree.ElementTree)

Francesco Cento

Просмотров 73 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 июл 2024
Overview of the Python xlm.etree.ElementTree module for parsing and editing and creating XML files.
Reference: docs.python.org/3/library/xml...
Next video of the series covering a special cases such as elements with missing tags: • Fixing Missing Element...

Комментарии • 45

@jdvelasquezr Месяц назад
Thank you, Francesco, for taking the time to review this library's different functions. You have greatly helped me finish a much-needed script for our localization engineering tasks. Notably, adding text to an existing tag saved the day.
@bayrakmusti1 Год назад ⁺³
That's how it is supposed to be taught. I have been browsing the courses on how to do it and they all are complicated. Thankfully found this video. Thanks a lot. Great job!
@user-xu8od8tf9l 9 месяцев назад ⁺²
Thanks a lot for the great tutorial. Your approach to XML parsing was spot-on for me and it was exactly what I was looking for to get started on XML parsing.
@ginopeduto4264 3 года назад ⁺⁵
Grazie Mille!!! That was exactly what I was looking for and all well explained!!!
@UsmanSaadat 2 года назад ⁺⁵
Thanks a lot for this video. I couldn't grasp the concepts properly even after reading from books. This video made it look like piece of cake.
@RodrigoMontes Год назад ⁺¹
Excellent man!
This is what I was looking for :)
@A_A7337 2 года назад ⁺²
Great video. Thanks
@stanleymbah8983 2 года назад ⁺¹
thank you for this
@giacomocillari4448 2 года назад
Is there a way to change sub-element instead of the whole element string? let's say for example that I want to change W with SW but not the name, and I need to do it in a loop so I can't put the name string inside as it changes anytime, is there a way to call the specific sub element?
@RS-el7iu 3 года назад
thanks a lot ❤
@attilioturco 5 месяцев назад
nice vid thanks
@debasishsahoo1268 24 дня назад
Awesome
@KiviliG 3 года назад
Can this be done by Beautifulsoup library?
@arnolda7417 3 года назад ⁺²
Hi Francesco. Thanks for the great video! I ran into an error after editing my xml file. I tried to view the entire file to make sure my changes were made with ET.dump(tree) and I always get "AttributeError: 'str' object has no attribute 'items'" I'm testing with Jupyter notebook and when I restart the kernel, ET.dump works just fine before I make changes to the file. Any idea on how to fix this? I'm new to Python.
@fcento 3 года назад ⁺¹
Hi Arnold, can you share the code?
@arnolda7417 3 года назад
@@fcento absolutely. Is there an email address I can send it to? I’d like to include the payload as well for reference
@hoscoharding7319 3 года назад ⁺¹
Hi Francesco! I have been trying to do something with elementtree for several days but it is impossible for me ... And it gives me the feeling that it is very simple. I want to make a little script that adds a child element only if it doesn't already have it. Imagine that the document lacks year to panama. My script would go through the xml document and add only the year to Panama ... Could you give me some idea please?
Many thanks.
@fcento 3 года назад ⁺³
I've just made a video about it: ruclips.net/video/5BrVPpOifto/видео.html
@xst9880 6 месяцев назад
Can you show us how to parse a Tableau dashboard file (*.twb)? It's an XML file, Tableau just renamed it. I am trying to create a data dictionary from the .twb file.
@vijayalakshmi8282 2 года назад
hii franseco great video thanks i need small suggestion here let's saya 100 so in this i need output like KTOPL 100 here i need tag and value both how we can get can u please explian
@sidjjj 2 года назад
Thanks for this video, I needed to parse xml from a variable instead of a file and found this : xml_data_tree = ET.fromstring(received_packet)
@markdillon9588 2 года назад
can you mass edit multiple files?
@shrinivasulunandyala9269 2 года назад
Merge XML files using python,can you please make video on this top
@myyoutubeaccount0123_ 2 года назад
thanks a lot
@fcento 2 года назад
Happy to help
@CinemagicMindset 2 года назад
Hi Francesco,
i'm getting error while parsing xml file since it is having special words. kindly hep me to avoid this error.
Error : xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 277, column 366
@fcento 2 года назад
If you are sure the file you have is a valid xml (there are online tools to help you there), then what comes to mind is incorrect encoding. Check the documentation here: docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.XMLParser
@fantasticprajwal7442 3 года назад
How to install xmltree in python 2.7.5
I am not able to upgrade due to restriction
@fcento 3 года назад
docs.python.org/2.7/library/xml.etree.elementtree.html it's a built in library no need to install. I recommend you start to figure out the restrictions because 2.7 is deprecated. 3.8 is now also available with Anaconda. Possibly some of the code I used will not work on 2.7, bear that in mind.
@KrishnaManohar8021 3 года назад
looking forword...
@arshap9351 3 года назад ⁺³
Increase your font size before doing tutorials. its quite complicated to read texts. anyway goodjob
@padraigmaccu9333 3 года назад ⁺¹
Go raibh céad maith agat, a Francesco. Rud a bhí de dhíth orm le fada. Pádraig Mac Con Uladh
@codelearnexe475 2 года назад ⁺¹
Was not expecting irish in this chat XD
@Gamer-mg6my 2 года назад
Hi i'm trying to get the text of every tag named , but inside every tag has this: , some idea to extract/ the content of the tags?:

1
1
1
0

#000000
1
0.010000

#000000
#000000
1
#000000

0.000000
0.590551
0.000000
0.000000
0
0.000000
0
0.000000
1

#000000
0
1.000000
0.166667

-1

0
-1.200000

1.651575
0.748031

0
0
0
0.708661
3.720472
6.023622
1.612205

#000000
#FFFFFF
1
#000000

1
#000000
1
0.039370

0
0
0
0
0.000000
0.000000

1.612205
0.000000

1.612205
-0.708661

0.000000
-0.708661

0.000000
0.000000

0
0
0
0.247563
3.889961
5.511811
1.273228

#000000
0
1.000000
0.247563

1
Entity

0
0
0
0.708661
3.720472
6.023622
1.612205

#000000
#FFFFFF
1
#000000

1
#000000
1
0.039370

0
0
0
0
0.000000
0.000000

1.612205
0.000000

1.612205
-0.708661

0.000000
-0.708661

0.000000
0.000000

0
0
0
0.247563
3.889961
5.511811
1.273228

#000000
0
1.000000
0.247563

1
EntityTwo

0
0
0
0.708661
3.720472
6.023622
1.612205

#000000
#FFFFFF
1
#000000

1
#000000
1
0.039370

0
0
0
0
0.000000
0.000000

1.612205
0.000000

1.612205
-0.708661

0.000000
-0.708661

0.000000
0.000000

0
0
0
0.247563
3.889961
5.511811
1.273228

#000000
0
1.000000
0.247563

1
EntityThree
@fcento 2 года назад ⁺¹
Let's take it in steps. I'm assuming you want to extract 'Entity', 'EntityTwo', 'EntityThree' from the element (...let me know if i misunderstood your question). The way it's formatted it contains 2 elements ( and ) as well as the piece of text you want to extract. If you just use findall() and use 'text' you get None back, what you want to use in this case is 'tail' instead. I've included a sample code here: gist.github.com/fcento100/74b8691af014a8126f8e9ca2ff03c6ea
@fcento 2 года назад ⁺¹
i've put the xml code from your comment in a file here gist.github.com/fcento100/19cb7ae6b857c539a2c2843519239efc for convenience
@Gamer-mg6my 2 года назад
@@fcento Yes, you understood me good. Ohhhh with tail .Well, i checked it but with other xml didn't compile :( , instead of that i put findall('.//cp', ns) and print elm.tail, with that we got the text. I like more your solution but with other xml didn't compile :(((((.This is the error that i got:
elmtail = elm.tail.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
@fcento 2 года назад ⁺¹
Apologies for not catching the 'NoneType' error, effectively 'tail' returns None if it doesn't find anything rather than an empty string. It's fixed now in this version: gist.github.com/fcento100/11847ad0d8d42eec6c1dc42de897b842 with an if statement to catch it. The reason i wasn't getting this error was because i copied pasted from your message and since it was formatted, 'tail' returned '
' and '\t' (which are the string representation of new-line and tab) where it should have returned None, hence why i was able to run the strip command everywhere without error.
In the new code i posted I've shown 2 methods of getting at that piece of data; in your sample xml "Entity" etc.. is the tail of ; root.findall('.//visio:Text/',ns) and root.findall('.//visio:cp',ns) do similar things. The only difference is that using './/visio:Text/' in method 1 will also extract the tail for if is available, which may be undesirable! In that case './/visio:cp' like you suggested is the way to go.
@Gamer-mg6my 2 года назад
@@fcento a lot of thanks for your kind help Francesco :))

Следующие

Автовоспроизведение

XML & ElementTree || Python Tutorial || Learn Python Programming