Re: How to batch convert binary files to "Text" by mayayana
mayayana
Fri Apr 06 09:30:59 CDT 2007
Do you maybe mean unicode rather than binary?
All files are "binary" in the sense that they're
composed of a series of bytes. A text file is just
one where all the bytes correspond to characters.
Ascii text uses 1 byte per character (at least in
the US and Europe) while unicode text uses 2
bytes. For example, if you look at a text file in a hex
editor that starts with the word "file", an an ascii
version will start with the bytes 102-105-108-101
or (hex) 66-69-6C-65. In English those correspond
to f-i-l-e. The unicode version would be:
66-00-69-00-6C-00-65-00
Unicode is representing each character as a 16-bit
numeric value rather than an 8-bit value.
If you need to convert unicode to ascii you can do it
by opening and resaving the file using Textstream.
(Note the extra parameters in the Textstream
methods that allow you to choose between unicode
and ascii.)
If you're not talking about unicode then maybe it's
some sort of encryption? In that case you'd need to
figure out what sort of encryption.
Another possibility: If you download from a Unix server
you might find that you have an HTML file with some
squares in it but no carriage returns. In that case it's
because of the different carriage return format. You can fix
that with the following. (Save this to Notepad, save as
a VBS file, and drop your distorted file onto it):
---------------------------
Dim fso, ts, s, arg, fil, fpath, s1
Set fso = CreateObject("Scripting.FileSystemObject")
If WScript.arguments.count = 0 Then
arg = InputBox("This script will correct web server text that
lacks carriage returns. Enter path of file.", "Fix File",
"C:\Windows\Desktop\")
Else
arg = WScript.arguments.item(0)
End If
If fso.FileExists(arg) = False Then
MsgBox "Wrong path", 64, "No such file"
WScript.Quit
End If
'-- ------got the file. read it into s.---------------------------
Set ts = fso.OpenTextFile(arg, 1, False)
s = ts.ReadAll
ts.Close
Set ts = Nothing
'-------- replace linefeed characters with vbcrlf ------------------------
s1 = Replace(s, vbCrLf, vbCr, 1, -1, 0)
s1 = Replace(s1, vbLf, vbCr, 1, -1, 0)
s1 = Replace(s1, vbCr, vbCrLf, 1, -1, 0
'-- -----write file. -----------------
If fso.fileexists(arg) = True Then
fso.deletefile arg, True
End If
Set ts = fso.CreateTextFile(arg, True)
ts.Write s1
ts.Close
Set ts = Nothing
Set fso = Nothing
MsgBox "All done", 64, "File fixed"
--------------------
If it's carriage return problems that you're talking
about, you could check for a distorted file by
opening the file, reading it as a string, and checking
to see whether it contains CrLf combinations.
> I have the following problem: some thousands html files were encoded
> as binary so they are quite not correctly accesible..
> I wrote (better cut&paste...) a simple vbs that opens each HTM file
> in a folder as a binary stream and rewrite it as text.
> This works fine on the "problem" files.
> Now the new problem is: how can I identify if a HTML file is binary or
> not? Since in my folders file are mixed, some are good html and some
> not (binary) I can't obviously work with extension.. is there
> something that may work this way:
> for each file in my directory check if it's binary, if so go on with
> the cool stuff else movenext.
> I'm gonna read more, but if someone could help...or even suggest
> another approach i'd really appreciate it.
>
> Thanks!
>