Exploring the Visio file

I have been voluntarily supporting Visio for almost thirty years since version 1, after I found a promo version of Visio on a Windows update floppy. Before that I had been creating business diagrams by writing code to drive plotters. The first version was in assembler, the second was in COBOL. The COBOL was far easier. Of course, anyone who has written COBOL can appreciate the eye opender Visio was. You dragged the shapes from a stencil, made the connections and if you moved a shape, the connections followed. PLUS, you were not limited to what was in the stencil. You could make your own shapes.

So, early on, I got into exploring the various Visio shapesheets. Initially the Visio file format was proprietory, but eventually they did release an XML file format option. Several versions back they standardized on Open XML, like the other members of the Office family. Open XML is a zipped file of a directory of XML files and other files, but mainly XML. So, it was just a matter of copying the Visio file and changing the extension to ZIP and unZIPing.

So, exploring a Visio file became practical. Of course, it was a directory of files and not easy to search aand the XML files were a single string so reading the XML was a challenge. I was able to copy some of the XML files to a Word doc and use VBA to seperate the tags and make it more readable. I was able to find out some interesting things and was able to copy blocks of Shape Data between shapes, but it was tedious, so I resorted to using Excel to store the Shape Data information and VBA to go through the shapes and delete the Shape Data and replace it with what was in the Excel sheet. Visio does not make it easy to sort Shape Data and provides a Sort field to give the impression of sorting, but the records are not sorted.

Along comes PowerShell to the rescue. I have used it to work with Visio object model, why could it not help with files? It could find the file, uncompress the ZIP file and wander the directory tree. With the help of a PowerShell wizard I was able to come up with something that did almost all the heavy lifting. This is what I came up with….

Set things up to find the file

$StartDir = [Environment]::GetFolderPath('mydocuments') + ’/visio’  
$verbose =$FALSE
$tmpTyp = 0
if ($verbose) {$tmpTyp = 1} 

CD $startdir    # Setup starting directory

create a dialog to ask the user for the file name…

# Create a dialog to request the file name.
Add-Type -AssemblyName System.Windows.Forms
$dialog = New-Object System.Windows.Forms.OpenFileDialog -Property @{ 
InitialDirectory = $startdir  
Title = ‘Visio Extract XML’   
Filter = 'All files (*.*)|*.*' 
}

Show the dialog and extract the filename…

$dialog.ShowDialog()
# Get the filename from the dialog.
# Expand-Archive only understands Zip files.
# The OpenXML files are zip files, so we have to rename to make Expand-Archive happy.
$src = $dialog.filename 
$dest = $dialog.filename.replace(".","") + “.zip”
# get the directory, filename and filename without extension.
$fildir = $dialog.filename | split-path
$filnam = $dialog.safefilename.replace(".","")
$filnom = $dialog.filename.replace(".","")
write-output $filnam
$dialog.Dispose()   # get rid of the dialog, it has served its’ purpose

Now that we know what file to use, make a copy and change the extension to ZIP. Expand-Archive is fussy and wants ZIP as the extension. Since Visio has several file extensions for Open XML, VSDX, VSSX, VSTX, make the extension as part of the name…

# Create a Zip file of the OpenXML file.
Copy-Item -Path $src -Destination $dest 
# Expand the Zip file.
Cd examinevsdx
expand-archive -literalpath $dest -force
remove-item $dest   # The Zip file is no longer needed

Now for the real work. Suprisingly, the actual work is trivial considering what I was expecting. The setup took most of the script. So, switch into the directory we just created and find the names within the directory.

# Now for the real work.
cd $filnom
#Get a list of the file names.
$Fnames = Get-ChildItem -recurse | where {!$_.PsIsContainer} | Select-Object 

Once we have the list of files, go through each file and process. To simplify the script, I evaluated each file extension and determined how the file should be processed. I added a flag, $verbose to keep the file size down. The relationship files can be ignored, but if necessary I can throw the switch to include them. $verbose sets the value of $tmpTyp at the start.

# Loop through the filenames – write a header and then break up the XML.
foreach ($file in $Fnames) { 
$f1=$file.name
# $ If (-not($f1 = “[Content_Type].xml”)){
 
$fext =  $file.name.substring($file.name.indexof(“.”)+1)
 
switch ($fext) {
   “XML”      {$typ = 1; break}
   “EMF”      {$typ = 2; break}
   “XML.RELS” {$typ = $tmpTyp; break}
   “RELS”     {$typ = $tmpTyp; break}
   “PNG”      {$typ = 2; break}
   “MP4”      {$typ = 2; break}
   “VTT”      {$typ = 2; break}
   default {$typ=0; break}
}

Now process the file based on the type. XML files have line breaks after each tag. Image files just have the name of the image file. We do not care about the image, just the name and that it exists. If it is an unidentified extension, then the extension type is flagged as unknown.

Throw in a blank line to make the spacing between files noticeable. … and the final brace to close things off.

Switch ($typ) {
0 {} # $typ is ignore
1 {
$txt =“<<<<” + $fext + “>>>>  ===” + $file.name + “==="
write-output $txt
Get-Content $file.fullname | Foreach-object { $_ -replace “><”,”>`r`n<" }  
}
2 { write-output  “<<<< image >>>>” $file.name }  
Default { write-output “unknown” $fext }
}
write-output “  ”  
}

One thing I did find interesting was that the script works for other Open XML files like from PowerPoint and Word.

John… Visio MVP in x-aisle
JohnVisioMVP.ca

Published by johnvisiomvp

The original Visio MVP. I have worked with the Visio team since 1993

One thought on “Exploring the Visio file

Comments are closed.