{"id":66473,"date":"2022-08-15T09:01:29","date_gmt":"2022-08-15T09:01:29","guid":{"rendered":"https:\/\/www.cryptocabaret.com\/?p=66473"},"modified":"2022-08-15T09:01:29","modified_gmt":"2022-08-15T09:01:29","slug":"how-odt-files-are-structured","status":"publish","type":"post","link":"https:\/\/www.cryptocabaret.com\/?p=66473","title":{"rendered":"How ODT files are structured"},"content":{"rendered":"<p><span class=\"field field--name-title field--type-string field--label-hidden\">How ODT files are structured<\/span><br \/>\n<span class=\"field field--name-uid field--type-entity-reference field--label-hidden\"><a title=\"View user profile.\" href=\"https:\/\/opensource.com\/users\/jim-hall\" class=\"username\">Jim Hall<\/a><\/span><br \/>\n<span class=\"field field--name-created field--type-created field--label-hidden\">Mon, 08\/15\/2022 &#8211; 03:00<\/span><\/p>\n<div data-drupal-selector=\"rate-node-70137\" class=\"rate-widget-thumbs-up\" title=\"Register or Login to like.\">\n<div class=\"rate-thumbs-up-btn-up vote-pending\">1 reader likes this<\/div>\n<div class=\"rate-score\">1 reader likes this<\/div>\n<\/div>\n<div class=\"clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item\">\n<p>Word processing files used to be closed, proprietary formats. In some older word processors, the document file was essentially a memory dump from the word processor. While this made for faster loading of the document into the word processor, it also made the document file format an opaque mess.<\/p>\n<p>Around 2005, the Organization for the Advancement of Structured Information Standards (OASIS) group defined an open format for office documents of all types, the Open Document Format for Office Applications (ODF). You may also see ODF referred to as simply &#8220;OpenDocument Format&#8221; because it is an open standard based on the <a href=\"http:\/\/openoffice.org\/\">OpenOffice.org&#8217;s<\/a> XML file specification. ODF includes several file types, including ODT for OpenDocument Text documents. There&#8217;s a lot to explore in an ODT file, and it starts with a zip file.<\/p>\n<\/p>\n<div class=\"embedded-resource-list callout-float-right\">\n<div class=\"field field--name-title field--type-string field--label-hidden field__item\">More Linux resources<\/div>\n<div class=\"field field--name-links field--type-link field--label-hidden field__items\">\n<div class=\"field__item\"><a href=\"https:\/\/developers.redhat.com\/cheat-sheets\/linux-commands-cheat-sheet\/?intcmp=70160000000h1jYAAQ\">Linux commands cheat sheet<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/developers.redhat.com\/cheat-sheets\/advanced-linux-commands\/?intcmp=70160000000h1jYAAQ\">Advanced Linux commands cheat sheet<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/www.redhat.com\/en\/services\/training\/rh024-red-hat-linux-technical-overview?intcmp=70160000000h1jYAAQ\">Free online course: RHEL technical overview<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/downloads\/cheat-sheet-networking?intcmp=70160000000h1jYAAQ\">Linux networking cheat sheet<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/downloads\/cheat-sheet-selinux?intcmp=70160000000h1jYAAQ\">SELinux cheat sheet<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/downloads\/linux-common-commands-cheat-sheet?intcmp=70160000000h1jYAAQ\">Linux common commands cheat sheet<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/resources\/what-are-linux-containers?intcmp=70160000000h1jYAAQ\">What are Linux containers?<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/tags\/linux?intcmp=70160000000h1jYAAQ\">Our latest Linux articles<\/a><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<h2>Zip structure<\/h2>\n<p>Like all ODF files, ODT is actually an XML document and other files wrapped in a zip file container. Using zip means files take less room on disk, but it also means you can use standard zip tools to examine an ODF file.<\/p>\n<p>I have an article about IT leadership called &#8220;Nibbled to death by ducks&#8221; that I saved as an ODT file. Since this is an ODF file, which is a zip file container, you can use unzip from the command line to examine it:<\/p>\n<pre>\n<div class=\"geshifilter\"><div class=\"bash geshifilter-bash\">$ <span class=\"kw2\">unzip<\/span> <span class=\"re5\">-l<\/span> <span class=\"st_h\">'Nibbled to death by ducks.odt'<\/span><br>\nArchive: Nibbled to death by ducks.odt<br>\nLength Date Time Name<br><span class=\"nu0\">39<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> mimetype<br><span class=\"nu0\">12713<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Thumbnails<span class=\"sy0\">\/<\/span>thumbnail.png<br><span class=\"nu0\">915001<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Pictures<span class=\"sy0\">\/<\/span>10000201000004500000026DBF6636B0B9352031.png<br><span class=\"nu0\">10879<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> content.xml<br><span class=\"nu0\">20048<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> styles.xml<br><span class=\"nu0\">9576<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> settings.xml<br><span class=\"nu0\">757<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> meta.xml<br><span class=\"nu0\">260<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> manifest.rdf<br><span class=\"nu0\">0<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Configurations2<span class=\"sy0\">\/<\/span>accelerator<span class=\"sy0\">\/<\/span><br><span class=\"nu0\">0<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Configurations2<span class=\"sy0\">\/<\/span>toolpanel<span class=\"sy0\">\/<\/span><br><span class=\"nu0\">0<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Configurations2<span class=\"sy0\">\/<\/span>statusbar<span class=\"sy0\">\/<\/span><br><span class=\"nu0\">0<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Configurations2<span class=\"sy0\">\/<\/span>progressbar<span class=\"sy0\">\/<\/span><br><span class=\"nu0\">0<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Configurations2<span class=\"sy0\">\/<\/span>toolbar<span class=\"sy0\">\/<\/span><br><span class=\"nu0\">0<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Configurations2<span class=\"sy0\">\/<\/span>popupmenu<span class=\"sy0\">\/<\/span><br><span class=\"nu0\">0<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Configurations2<span class=\"sy0\">\/<\/span>floater<span class=\"sy0\">\/<\/span><br><span class=\"nu0\">0<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> Configurations2<span class=\"sy0\">\/<\/span>menubar<span class=\"sy0\">\/<\/span><br><span class=\"nu0\">1192<\/span> 07-<span class=\"nu0\">15<\/span>-<span class=\"nu0\">2022<\/span> <span class=\"nu0\">22<\/span>:<span class=\"nu0\">18<\/span> META-INF<span class=\"sy0\">\/<\/span>manifest.xml<br><span class=\"nu0\">970465<\/span> <span class=\"nu0\">17<\/span> files<\/div><\/div><\/pre>\n<p>I want to highlight a few elements of the zip file structure:<\/p>\n<ol>\n<li>The <code>mimetype<\/code> file contains a single line that defines the ODF document. Programs that process ODT files, such as a word processor, can use this file to verify the <code>MIME<\/code> type of the document. For an ODT file, this should always be:<\/li>\n<\/ol>\n<pre>\n<span class=\"geshifilter\"><code class=\"bash geshifilter-bash\">application<span class=\"sy0\">\/<\/span>vnd.oasis.opendocument.text<\/code><\/span><\/pre>\n<ol start=\"2\">\n<li>The <code>META-INF<\/code> directory has a single <code>manifest.xml<\/code> file in it. This file contains all the information about where to find other components of the ODT file. Any program that reads ODT files starts with this file to locate everything else. For example, the <code>manifest.xml<\/code> file for my ODT document contains this line that defines where to find the main content:<\/li>\n<\/ol>\n<pre>\n<span class=\"geshifilter\"><code class=\"bash geshifilter-bash\"><span class=\"sy0\">manifest:file-entry manifest:full-path=<span class=\"st0\">\"content.xml\"<\/span> manifest:media-type=<span class=\"st0\">\"text\/xml\"<\/span><span class=\"sy0\">\/&gt;<\/span><\/span><\/code><\/span><\/pre>\n<ol start=\"3\">\n<li>\n<p>The <code>content.xml<\/code> file contains the actual content of the document.<\/p>\n<\/li>\n<li>\n<p>My document includes a single screenshot, which is contained in the <code>Pictures<\/code> directory.<\/p>\n<\/li>\n<\/ol>\n<h2>Extracting files from an ODT file<\/h2>\n<p>Because the ODT document is just a zip file with a specific structure to it, you can extract files from it. You can start by unzipping the entire ODT file, such as with this unzip command:<\/p>\n<pre>\n<span class=\"geshifilter\"><code class=\"bash geshifilter-bash\"><span class=\"co4\">$ <\/span><span class=\"kw2\">unzip<\/span> <span class=\"re5\">-q<\/span> <span class=\"st_h\">'Nibbled to death by ducks.odt'<\/span> <span class=\"re5\">-d<\/span> Nibbled<\/code><\/span><\/pre>\n<p>A colleague recently asked for a copy of the image that I included in my article. I was able to locate the exact location of any embedded image by looking in the <code> META-INF\/manifest.xml <\/code> file. The <code>grep<\/code> command can display any lines that describe an image:<\/p>\n<pre>\n<div class=\"geshifilter\"><div class=\"bash geshifilter-bash\">$ <span class=\"kw3\">cd<\/span> Nibbled<br>\n$ <span class=\"kw2\">grep<\/span> image META-INF<span class=\"sy0\">\/<\/span>manifest.xml<br><span class=\"sy0\">manifest:file-entry manifest:full-path=<span class=\"st0\">\"Thumbnails\/thumbnail.png\"<\/span> manifest:media-type=<span class=\"st0\">\"image\/png\"<\/span><span class=\"sy0\">\/&gt;<\/span><br><span class=\"sy0\">manifest:file-entry manifest:full-path=<span class=\"st0\">\"Pictures\/10000201000004500000026DBF6636B0B9352031.png\"<\/span> manifest:media-type=<span class=\"st0\">\" image\/png\u201d\/&gt;<\/span><\/span><\/span><\/div><\/div><\/pre>\n<p>The image I&#8217;m looking for is saved in the <code>Pictures<\/code> folder. You can verify that by listing the contents of the directory:<\/p>\n<pre>\n<div class=\"geshifilter\"><div class=\"bash geshifilter-bash\">$ <span class=\"kw2\">ls<\/span> <span class=\"re5\">-F<\/span><br>\nConfigurations2<span class=\"sy0\">\/<\/span> manifest.rdf meta.xml Pictures<span class=\"sy0\">\/<\/span> styles.xml<br>\ncontent.xml META-INF<span class=\"sy0\">\/<\/span> mimetype settings.xml Thumbnails<span class=\"sy0\">\/<\/span><\/div><\/div><\/pre>\n<p>And here it is:<\/p>\n<article class=\"align-center media media--type-image media--view-mode-default\">\n<div class=\"field field--name-field-media-image field--type-image field--label-hidden field__item\">  <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.cryptocabaret.com\/wp-content\/uploads\/2022\/08\/ducks.png\" width=\"1104\" height=\"621\" alt=\"Image of rubber ducks in two bowls\"><\/div>\n<div class=\"field field--name-field-caption field--type-text-long field--label-hidden caption field__item\"><span class=\"caption__byline\">Image by: <\/span><\/p>\n<p>(Jim Hall, CC BY-SA 40)<\/p>\n<\/div>\n<\/article>\n<h2>OpenDocument Format<\/h2>\n<p>OpenDocument Format (ODF) files are an open file format that can describe word processing files (ODT), spreadsheet files (ODS), presentations (ODP), and other file types. Because ODF files are based on open standards, you can use other tools to examine them and even extract data from them. You just need to know where to start. All ODF files start with the <code>META-INF\/manifest.xml<\/code> file, which is the &#8220;root&#8221; or &#8220;bootstrap&#8221; file for the rest of the ODF file format. Once you know where to look, you can find the rest of the content.<\/p>\n<\/div>\n<div class=\"clearfix text-formatted field field--name-field-article-subhead field--type-text-long field--label-hidden field__item\">\n<p>Because OpenDocument Format (ODF) are based on open standards, you can use other tools to examine them and even extract data from them. You just need to know where to start.<\/p>\n<\/div>\n<div class=\"field field--name-field-lead-image field--type-entity-reference field--label-hidden field__item\">\n<article class=\"media media--type-image media--view-mode-caption\">\n<div class=\"field field--name-field-media-image field--type-image field--label-hidden field__item\">  <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.cryptocabaret.com\/wp-content\/uploads\/2022\/08\/coffee_tea_laptop_computer_work_desk-1.png\" width=\"520\" height=\"292\" alt=\"Person drinking a hat drink at the computer\" title=\"Person drinking a hat drink at the computer\"><\/div>\n<div class=\"field field--name-field-caption field--type-text-long field--label-hidden caption field__item\"><span class=\"caption__byline\">Image by: <\/span><\/p>\n<p><a href=\"https:\/\/unsplash.com\/@jonasleupe?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"ugc noopener\">Jonas Leupe<\/a>\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/s\/photos\/tea-cup-computer?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"ugc noopener\">Unsplash<\/a><\/p>\n<\/div>\n<\/article>\n<\/div>\n<div class=\"field field--name-field-tags field--type-entity-reference field--label-hidden field__items\">\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/tags\/linux\" hreflang=\"en\">Linux<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/tags\/documentation\" hreflang=\"en\">Documentation<\/a><\/div>\n<\/p><\/div>\n<div class=\"field field--name-field-listicle-title field--type-string field--label-hidden field__item\">What to read next<\/div>\n<div class=\"field field--name-field-listicles field--type-entity-reference field--label-hidden field__items\">\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/article\/22\/7\/fmt-trivial-text-formatter\" hreflang=\"en\">How I use the Linux fmt command to format text<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/article\/22\/8\/automate-file-edits-sed-linux\" hreflang=\"en\">How I use the Linux sed command to automate file edits<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/article\/22\/8\/old-school-technical-writing-groff\" hreflang=\"en\">Old-school technical writing with groff<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/article\/22\/8\/pdf-latex\" hreflang=\"en\">Create beautiful PDFs in LaTeX<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/article\/22\/8\/gentle-introduction-html\" hreflang=\"en\">A gentle introduction to HTML<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/article\/22\/8\/writing-project-documentation-html\" hreflang=\"en\">Writing project documentation in HTML<\/a><\/div>\n<div class=\"field__item\"><a href=\"https:\/\/opensource.com\/article\/22\/8\/css-html-project-documentation\" hreflang=\"en\">Level up your HTML document with CSS<\/a><\/div>\n<\/p><\/div>\n<div class=\"field field--name-field-default-license field--type-list-string field--label-hidden field__item\"><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-sa\/4.0\/\"><br \/>\n        <img decoding=\"async\" alt=\"Creative Commons License\" src=\"https:\/\/www.cryptocabaret.com\/wp-content\/uploads\/2022\/08\/cc-by-sa--20.png\" title=\"This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.\"><\/a>This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.<\/div>\n<section class=\"field field--name-field-comments field--type-comment field--label-hidden comment-wrapper\">\n<div class=\"comments__count\">\n<div class=\"login\"><a href=\"https:\/\/opensource.com\/user\/register?absolute=1\">Register<\/a> or <a href=\"https:\/\/opensource.com\/user\/login?current=\/rss.xml&amp;absolute=1\">Login<\/a> to post a comment.<\/div>\n<\/p><\/div>\n<\/section>\n<p class=\"wpematico_credit\"><small>Powered by <a href=\"http:\/\/www.wpematico.com\" target=\"_blank\" rel=\"noopener\">WPeMatico<\/a><\/small><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How ODT files are structured Jim Hall Mon, 08\/15\/2022 &#8211; 03:00 1 reader likes this 1 reader likes this Word processing files used to be closed, proprietary formats. In some older word processors, the document file was essentially a memory dump from the word processor. While this made for faster loading of the document into [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":66474,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[307],"tags":[],"class_list":["post-66473","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-open-source"],"_links":{"self":[{"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=\/wp\/v2\/posts\/66473","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=66473"}],"version-history":[{"count":0,"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=\/wp\/v2\/posts\/66473\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=\/wp\/v2\/media\/66474"}],"wp:attachment":[{"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=66473"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=66473"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.cryptocabaret.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=66473"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}