I'm developing a virtual printer driver, and it handles most documents
well. Where it fails is with PDF documents containing east Asian
characters. I've found some other posts which mention that Adobe
provides glyph indices, rather than unicode characters. How do I get
the actual text in this situation?

Thanks,
chris

Re: Extracting text from PDF documents for virtual printer driver by Tim

Tim
Thu Mar 27 23:21:10 CDT 2008

Chris <christopher.burns@gmail.com> wrote:
>
>I'm developing a virtual printer driver, and it handles most documents
>well. Where it fails is with PDF documents containing east Asian
>characters. I've found some other posts which mention that Adobe
>provides glyph indices, rather than unicode characters. How do I get
>the actual text in this situation?

It depends on the font. Non-TrueType fonts, and some TrueType fonts, do
not use Unicode encoding. You need to chase down the font in use to learn
what encoding it uses.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Re: Extracting text from PDF documents for virtual printer driver by Chris

Chris
Fri Mar 28 08:29:57 CDT 2008

On Mar 27, 11:21 pm, Tim Roberts <t...@probo.com> wrote:
> Chris <christopher.bu...@gmail.com> wrote:
>
> >I'm developing a virtual printer driver, and it handles most documents
> >well. Where it fails is with PDF documents containing east Asian
> >characters. I've found some other posts which mention that Adobe
> >provides glyph indices, rather than unicode characters. How do I get
> >the actual text in this situation?
>
> It depends on the font. Non-TrueType fonts, and some TrueType fonts, do
> not use Unicode encoding. You need to chase down the font in use to learn
> what encoding it uses.
> --
> Tim Roberts, t...@probo.com
> Providenza & Boekelheide, Inc.

Thanks, Tim. Actually, I just realized that Adobe provides glyph
indices for all fonts, so my current solution is really just a hack.

Are you saying there's no font-agnostic way to extract text from a PDF
file?

Re: Extracting text from PDF documents for virtual printer driver by Tim

Tim
Sat Mar 29 18:39:22 CDT 2008

Chris <christopher.burns@gmail.com> wrote:
>
>Thanks, Tim. Actually, I just realized that Adobe provides glyph
>indices for all fonts, so my current solution is really just a hack.
>
>Are you saying there's no font-agnostic way to extract text from a PDF
>file?

No. That's a very different question from the one you originally asked. A
PDF file is just Postscript (with a couple of extensions). Once you
decompress the PDF chunks, the Postscript is easily readable.

You originally asked if there was a font-agnostic way to extract the text
from a PDF file **from a printer driver**. The answer to that is "no".
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

RE: Extracting text from PDF documents for virtual printer driver by ssenthilkumar

ssenthilkumar
Wed Apr 02 04:05:01 CDT 2008

Hi,

Here by I need your help for developing a printer driver for converting
Printable files into Tiff images. I am looking for a starting point to
initiate. Yet I did not get.

If could help me with guide lines and any sample it would be helpful.

Thanks and regards
s.senthil kumar

s.senthilkumar@capdigisoft.net

Re: Extracting text from PDF documents for virtual printer driver by Tim

Tim
Wed Apr 02 23:53:02 CDT 2008

s.senthil kumar <ssenthilkumar@discussions.microsoft.com> wrote:

>Hi,
>
>Here by I need your help for developing a printer driver for converting
>Printable files into Tiff images. I am looking for a starting point to
>initiate. Yet I did not get.
>
>If could help me with guide lines and any sample it would be helpful.

Start with one of the samples in the WDK. Have GDI draw everything into a
bitmap, then compress the bitmap to TIFF when you're done.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.