Thursday 31 July 2008

Using help to help yourself (part 2): publishing batch help information in HTML format

Well, it took quite a bit of head-scratching but here it is: a batch script that outputs information about commands in HTML format.

This is the code (NOTE: works with french accented words)

[commands_site.bat]

@echo off

:init
setlocal enabledelayedexpansion
set out_dir=site
set page_sub=pages
set page_dir=%out_dir%\%page_sub%
if not exist %page_dir% mkdir %page_dir%
goto create_site

:create_site
set ifile=%out_dir%\index.html
> %ifile% echo ^<html^>^<head^>^<title^>Commands index^</title^>^</head^>
>> %ifile% echo ^<body^>
for /F "tokens=1" %%C in ('help') do (
call :set_is_command %%C
if "!is_command!"=="1" (
echo Handling: %%C
set cfile=%page_dir%\%%C.html
>> %ifile% echo ^<a href="%page_sub%\%%C.html"^>%%C^</a^>^<br/^>
:: set cfile=CON
> !cfile! echo ^<html^>^<head^>^<title^>%%C^</title^>^</head^>
>> !cfile! echo ^<body^>^<h2^>%%C^</h2^>^<pre^>
for /F "delims=" %%T in ('help %%C') do (
set str=%%T
set str=!str:…=^&agrave;!
set str=!str:‚=^&eacute;!
set str=!str:Š=^&egrave;!
set str=!str:ˆ=^&ecirc;!
set str=!str:‰=^&euml;!
set str=!str:Œ=^&icirc;!
set str=!str:‹=^&iuml;!
set str=!str:“=^&ocirc;!
set str=!str:—=^&ugrave;!
set str=!str:–=^&ucirc;!
set str=!str:‡=^&ccedil;!
set str=!str:ÿ=^&nbsp;!
>> !cfile! echo !str!
)
>> !cfile! echo ^<br/^>^<br/^>
>> !cfile! echo ^<a href="../index.html"^>Back to index/Retour ^&agrave; l'index^</a^>
>> !cfile! echo ^</pre^>^</body^>^</html^>
)
)
>> %ifile% echo ^</body^>
>> %ifile% echo ^</html^>

goto end

:set_is_command
set name=%~1
set is_command=1
if not "%name%"=="" (
set num=1
for /F "usebackq tokens=1 delims=ABCDEFGHIJKLMNOPQRSTUVWXYZ" %%W IN ('%name%') do (
set num=0
)
set is_command=!num!
)
goto blackhole

:end
echo Press any key to quit...
pause > NUL

:blackhole

So, how does it work?

One
We enable delayedexpansion. This allows us to dynamically set variables in FOR loops.

Two
We prepare the output directories (mkdir). The root of the "site" is the site directory (this is where the site index will get stored). In this directory, a subdirectory is created which will hold all the pages (one for each command): the pages subdirectory.

Three
A loop (for /F "tokens=1" %%C in ('help')...) is used to loop through all the lines output by the help command. Each command is output starting with the command name and one or more lines of short explanations like the following example shows:

VERIFY Indique à Windows 2000 s'il doit ou non vérifier que les fichiers
sont écrits correctement sur un disque donné.

Notice the accented characters shown in the explanation (é, à).

Four
Obviously we only want to retrieve the command names (e.g. VERIFY) and nothing else. To do this we must check the first token of every line. In our case the first token will be VERIFY for the first line, and "sont" for the second line. Only the first of these two tokens is an actual command name. Fortunately, it has only uppercase letters.
The set_is_command subroutine makes sure the is_command variable is set to 1 if the token is a command (i.e. all uppercase letters) or 0 if the token isn't (one or more lowercase letters e.g. "sont").

Five
The script will output: one file for each command (containing its detailed help information) and an index file which will reference all these command pages.
For every command name, we:
- create a link in the index file
- create a file containing the detailed information (the result of help %commandname%)

Six
Notice there is a strange section with a few lines like:
set str=!str:Œ=^&icirc;!

At first, when I tried to output the result of 'help %commandname%' to an HTML file, the accented characters got replaced by different characters. For instance: "é" was shown as ",", "è" was shown as "Š", and so on and so forth. After some Googling and much head-scratching, I came up with this.
The idea is to replace the "bad" character with a character which will give me the desired result. Because I am working with HTML a good way to do this is to replace these accented characters with the corresponding HTML entities. Thus, é becomes &eacute;, à becomes &agrave;, etc.
I used a simple replacement syntax set str=!str:S=R! where S represents what you're searching for and R represents the desired replacement. Neat little trick.

There is a catch however. To work properly with accented characters, you need to open the batch file using EDIT (cmd > EDIT). If you open the above code using EDIT you will find that "…" becomes "à", "‚" becomes "é", "Š" becomes "è", etc. Once you have finished entering the accented characters, you can then revert to notepad for instance but the characters will look "strange" (i.e. as in the example above).
I guess you could do exactly the same with text, by typing the accented characters in notepad (for instance), then opening the file in EDIT. The notepad characters will look weird in EDIT but you can then type in EDIT the original characters to be replaced with the "weird" notepad characters using the S=R syntax. I haven't tried this but I expect it works.

Well, I think that's it. I hope someone will find this useful!

Thoughts?

No comments:

Online Marketing
Add blog to our blog directory blog search directory Blog Directory Blogarama - The Blog Directory