f



Python version of perl's "if (-T ..)" and "if (-B ...)"?

Perl has the following constructs to check whether a file is considered
to contain "text" or "binary" data:

if (-T $filename) { print "file contains 'text' characters\n"; }
if (-B $filename) { print "file contains 'binary' characters\n"; }

Is there already a Python analog to these? I'm happy to write them on
my own if no such constructs currently exist, but before I start, I'd
like to make sure that I'm not "re-inventing the wheel".

By the way, here's what the perl docs say about these constructs. I'm
looking for something similar in Python:

.... The -T  and -B  switches work as follows. The first block or so
.... of the file is examined for odd characters such as strange control
.... codes or characters with the high bit set. If too many strange
.... characters (>30%) are found, it's a -B file; otherwise it's a -T
.... file. Also, any file containing null in the first block is
.... considered a binary file. [ ... ]

Thanks in advance for any suggestions.

-- 
 Lloyd Zusman
 ljz@asfast.com
 God bless you.


0
ljz (199)
2/12/2010 2:01:20 PM
comp.lang.python 77058 articles. 6 followers. Post Follow

2 Replies
382 Views

Similar Articles

[PageSpeed] 34

In article <mailman.2434.1265983307.28905.python-list@python.org>,
Lloyd Zusman  <ljz@asfast.com> wrote:
>
>Perl has the following constructs to check whether a file is considered
>to contain "text" or "binary" data:
>
>if (-T $filename) { print "file contains 'text' characters\n"; }
>if (-B $filename) { print "file contains 'binary' characters\n"; }

Assuming you're on a Unix-like system or can install Cygwin, the
standard response is to use the "file" command.  It's *much* more
sophisticated.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"At Resolver we've found it useful to short-circuit any doubt and just        
refer to comments in code as 'lies'. :-)"
0
aahz (2032)
2/12/2010 2:26:51 PM
aahz@pythoncraft.com (Aahz) writes:

> In article <mailman.2434.1265983307.28905.python-list@python.org>,
> Lloyd Zusman  <ljz@asfast.com> wrote:
> >if (-T $filename) { print "file contains 'text' characters\n"; }
> >if (-B $filename) { print "file contains 'binary' characters\n"; }
>
> Assuming you're on a Unix-like system or can install Cygwin, the
> standard response is to use the "file" command. It's *much* more
> sophisticated.

Indeed, the ‘file’ command is an expected (though not standard) part of
most Unix systems, and its use frees us from the lies of filenames about
their contents.

The sophistication comes from an extensive database of heuristics —
filesystem attributes, “magic” content signatures, and parsing — that
are together known as the “magic database”. This database is maintained
along with the ‘file’ program, and made accessible through a C library
from the same code base called ‘magic’.

So, you don't actually need to use the ‘file’ command to access this
sophistication. Just about every program on a GNU system that wants to
display file types, such as the graphical file manager, will query the
‘magic’ library directly to get the file type.

The ‘file’ code base has for a while now also included a Python
interface to this library, importable as the module name ‘magic’.
Unfortunately it isn't registered at PyPI as far as I can tell. (There
are several project using the name “magic” that implement something
similar, but they are nowhere near as sophisticated.)

On a Debian GNU/Linux system, you can install the ‘python-magic’ package
to get this installed. Otherwise, you can build it from the ‘file’ code
base <URL:http://www.darwinsys.com/file/>.

-- 
 \        “I don't accept the currently fashionable assertion that any |
  `\       view is automatically as worthy of respect as any equal and |
_o__)                                   opposite view.” —Douglas Adams |
Ben Finney
0
python6 (1029)
2/12/2010 8:41:40 PM
Reply: