If a size_t is cast to a long, and size_t is the length of a unicode string, does the resulting long need to be divided by sizeof(_TCHAR) in order to get the actual length in _TCHARs?

Re: size_t / long by Carl

Carl
Tue May 11 09:05:21 CDT 2004

songie D wrote:
> If a size_t is cast to a long, and size_t is the length of a unicode
> string, does the resulting long need to be divided by sizeof(_TCHAR)
> in order to get the actual length in _TCHARs?

If the size_t contains the size of a TCHAR string, you should always divide
by sizeof(TCHAR) to get the length of the string.

-cd



Re: size_t / long by Carl

Carl
Tue May 11 10:32:25 CDT 2004

songie D wrote:
> ok Carl.
> maybe you can help me:
> I've got a string (of _TCHARs), for instance
> _TCHAR* mystring = "quick brown fox jumps over lazy dog";
>
> I then use this:
> wordlen = (long)_tcsspn(mystring, "abcdefgh...wxyz");
> (where the second argument is the whole alphabet but without space)
>
> which of course returns 6. This is what I would expect, as 6 is the

It returns 5...

> position of the first character that ISN'T an alphabetic character,
> i.e.
> a space.
>
> I now want to extract the word "quick" and copy it into a string of
> its own.
> For this I'm allocating a dynamic array of _TCHARs on the heap (I
> don't
> know how long the word might be).
> using:
> _TCHAR* word = new _TCHAR[wordlen];
> _tcsncpy(word, mystring, wordlen);
> ...<do some operations on the 'word' variable>
> delete[] word;

You NEED to allocate space for the terminating NULL, so do new
TCHAR[wordlen+1]. As is, you're getting an un-terminated word in your array
and seeing garbage that's past the end of your allocation. Note that in
general, new will allocate more memory than you ask for due to
alignment/granularity requirements.

btw, all of this is much easier and less error-prone if you use std::string
instead of dealing with low-level details yourself:

typedef std::basic_string<TCHAR> tstring;

tstring mystring("quick brown fox jumps over lazy dog");
tstring alphabet("abcdefgh...wxyz");
tstring word = mystring.substr(0,mystring.find_first_not_of(alphabet,0));

-cd




Re: size_t / long by Vincent

Vincent
Tue May 11 10:59:39 CDT 2004

_tcsspn() should (does in my tests) return 5 (i.e., the length of the
substring (and also the 0-based position of the space)). Since wordlen is
less than the length of mystring, _tcsncpy() does not nul-terminate word,
and so you see garbage (in a place you shouldn't be looking) after "quick".
Finally, word[wordlen] is an illegal reference to a position outside of word
(as you have it, the last char in the array word is word[wordlen-1]). You
should be able to fix it all with

_TCHAR* word = new _TCHAR[wordlen+1];

to leave room for a nul-terminator, and making

word[wordlen] = 0;

valid.

On Tue, 11 May 2004 08:21:04 -0700, "songie D"
<anonymous@discussions.microsoft.com> wrote:

>ok Carl.
>maybe you can help me:
>I've got a string (of _TCHARs), for instance
>_TCHAR* mystring = "quick brown fox jumps over lazy dog";
>
>I then use this:
>wordlen = (long)_tcsspn(mystring, "abcdefgh...wxyz");
>(where the second argument is the whole alphabet but without space)
>
>which of course returns 6. This is what I would expect, as 6 is the
>position of the first character that ISN'T an alphabetic character, i.e.
>a space.
>
>I now want to extract the word "quick" and copy it into a string of its own.
>For this I'm allocating a dynamic array of _TCHARs on the heap (I don't
>know how long the word might be).
>using:
>_TCHAR* word = new _TCHAR[wordlen];
>_tcsncpy(word, mystring, wordlen);
>...<do some operations on the 'word' variable>
>delete[] word;
>
>however the problem with this that I'm finding, is that new allocates word to be far too long. As such,
>'quick' never fills the word, and it is padded with square block characters. I *need* this word variable
>to ONLY consist of the word, hence my reason for allocating a new variable for it.
>But if I manually put in a null terminating character, i.e.
>word[wordlen] = NULL;
>then delete[] fails.
>
>How can I do this successfully?
>BTW unicode is ON for this DLL, which means that sizeof(_TCHAR) is 2. I've checked this.
>But for some weird reason the memory allocated is far too big.
>Should I be using some other memory allocation function?
>
>Thanks

--
- Vince

Re: size_t / long by songie

songie
Tue May 11 13:55:02 CDT 2004

>
> It returns 5...

yes, sorry 5. I meant 5.

> You NEED to allocate space for the terminating NULL, so do new
> TCHAR[wordlen+1]. As is, you're getting an un-terminated word in your
array
> and seeing garbage that's past the end of your allocation. Note that in
> general, new will allocate more memory than you ask for due to
> alignment/granularity requirements.

OK so supposing I do, then I'll get even more memory. But I see the point,
it *might* fill it. I'll put it another way...
no I won't, I'll just repeat the question. How can I get the variable to
contain JUST
the string, with terminating null if that's what it entails, and then still
successfully delete[] it?


>
> btw, all of this is much easier and less error-prone if you use
std::string
> instead of dealing with low-level details yourself:

Nah, that'd defeat the point of writing this part of the program in
unmanaged
C++. I might aswell write it in C#, that the rest of the program's written
in.
This is a routine that's going to be called probably every time
the user types a key, possibly many times per the user types a key.

>
> typedef std::basic_string<TCHAR> tstring;
>
> tstring mystring("quick brown fox jumps over lazy dog");
> tstring alphabet("abcdefgh...wxyz");
> tstring word = mystring.substr(0,mystring.find_first_not_of(alphabet,0));


mmm. wonder where find_first_not_of() comes from , the tooth fairy?



Re: size_t / long by songie

songie
Tue May 11 13:57:50 CDT 2004

Excellent
You reckon delete[] should still work if I put a null terminating character
in as the last
array element that I've requested (not necessarily the last one that I've
been given)??
Anyway thanks.
I'll try it.

"Vincent Fatica" <abuse@localhost> wrote in message
news:40a0f86b$1@localhost...
> _tcsspn() should (does in my tests) return 5 (i.e., the length of the
> substring (and also the 0-based position of the space)). Since wordlen is
> less than the length of mystring, _tcsncpy() does not nul-terminate word,
> and so you see garbage (in a place you shouldn't be looking) after
"quick".
> Finally, word[wordlen] is an illegal reference to a position outside of
word
> (as you have it, the last char in the array word is word[wordlen-1]). You
> should be able to fix it all with
>
> _TCHAR* word = new _TCHAR[wordlen+1];
>
> to leave room for a nul-terminator, and making
>
> word[wordlen] = 0;
>
> valid.
>
> On Tue, 11 May 2004 08:21:04 -0700, "songie D"
> <anonymous@discussions.microsoft.com> wrote:
>
> >ok Carl.
> >maybe you can help me:
> >I've got a string (of _TCHARs), for instance
> >_TCHAR* mystring = "quick brown fox jumps over lazy dog";
> >
> >I then use this:
> >wordlen = (long)_tcsspn(mystring, "abcdefgh...wxyz");
> >(where the second argument is the whole alphabet but without space)
> >
> >which of course returns 6. This is what I would expect, as 6 is the
> >position of the first character that ISN'T an alphabetic character, i.e.
> >a space.
> >
> >I now want to extract the word "quick" and copy it into a string of its
own.
> >For this I'm allocating a dynamic array of _TCHARs on the heap (I don't
> >know how long the word might be).
> >using:
> >_TCHAR* word = new _TCHAR[wordlen];
> >_tcsncpy(word, mystring, wordlen);
> >...<do some operations on the 'word' variable>
> >delete[] word;
> >
> >however the problem with this that I'm finding, is that new allocates
word to be far too long. As such,
> >'quick' never fills the word, and it is padded with square block
characters. I *need* this word variable
> >to ONLY consist of the word, hence my reason for allocating a new
variable for it.
> >But if I manually put in a null terminating character, i.e.
> >word[wordlen] = NULL;
> >then delete[] fails.
> >
> >How can I do this successfully?
> >BTW unicode is ON for this DLL, which means that sizeof(_TCHAR) is 2.
I've checked this.
> >But for some weird reason the memory allocated is far too big.
> >Should I be using some other memory allocation function?
> >
> >Thanks
>
> --
> - Vince



Re: size_t / long by Carl

Carl
Tue May 11 14:16:47 CDT 2004

songie D wrote:
>> It returns 5...
>
> yes, sorry 5. I meant 5.
>
>> You NEED to allocate space for the terminating NULL, so do new
>> TCHAR[wordlen+1]. As is, you're getting an un-terminated word in
>> your array and seeing garbage that's past the end of your
>> allocation. Note that in general, new will allocate more memory
>> than you ask for due to alignment/granularity requirements.
>
> OK so supposing I do, then I'll get even more memory. But I see the
> point, it *might* fill it. I'll put it another way...
> no I won't, I'll just repeat the question. How can I get the variable
> to contain JUST
> the string, with terminating null if that's what it entails, and then
> still successfully delete[] it?

You can: allocate wordlen+1. There's no way you can force new[] to allocate
exactly as much space as you request - it's free to allocate more. That
said, any attempt by you to access beyond the size you requested is
undefined behavior. It's free to allocate exactly the amount you request
one time, and 10X the amount you request the next time - you simply cannot
assume anything beyond:

1. the allocation was at least as large as you requested.
2. you can safely access all of the elements that you requested (i.e. for
new T[n], you can access indexes 0..n-1).
3. assuming you haven't violated #2, that you can pass the same pointer
returned by new[] to delete[].

>> btw, all of this is much easier and less error-prone if you use
>> std::string instead of dealing with low-level details yourself:
>
> Nah, that'd defeat the point of writing this part of the program in
> unmanaged
> C++. I might aswell write it in C#, that the rest of the program's
> written in.

Then there's likely to valid reason to write it in unmanaged C++. You're
apparently operating under the falacious assumption that managed code is
slow, or that this function is going to be a bottleneck in your program
(have you profiled it to find out?).

> This is a routine that's going to be called probably every time
> the user types a key, possibly many times per the user types a key.

So? Users typing keys are monumentally slow - you can run 10's (maybe
100's) of millions of CPU instructions between keystrokes on a modern CPU.
Besides, it's likely that the std::string solution, if properly written,
will be the same speed as your fragile hand-crafted solution.

>> typedef std::basic_string<TCHAR> tstring;
>>
>> tstring mystring("quick brown fox jumps over lazy dog");
>> tstring alphabet("abcdefgh...wxyz");
>> tstring word =
>> mystring.substr(0,mystring.find_first_not_of(alphabet,0));
>
> mmm. wonder where find_first_not_of() comes from , the tooth fairy?

find_first_not_of is a member function of std::basic_string<CharT> - note
the . between mystring and find_first_not_of in the above sample. It comes
not from the tooth fairy, nor from Microsoft, but from the ISO C++ standard.

-cd



Re: size_t / long by Carl

Carl
Tue May 11 14:17:28 CDT 2004

songie D wrote:
> Excellent
> You reckon delete[] should still work if I put a null terminating
> character in as the last
> array element that I've requested (not necessarily the last one that
> I've been given)??
> Anyway thanks.
> I'll try it.

Yes, of course - just don't access beyond what you requested and delete[]
will work.
-cd




Re: size_t / long by songie

songie
Tue May 11 15:05:45 CDT 2004

> You can: allocate wordlen+1. There's no way you can force new[] to
allocate
> exactly as much space as you request - it's free to allocate more. That
> said, any attempt by you to access beyond the size you requested is
> undefined behavior. It's free to allocate exactly the amount you request
> one time, and 10X the amount you request the next time - you simply cannot
> assume anything beyond:

ok. I get the picture. I'll try it.

>
> 1. the allocation was at least as large as you requested.
> 2. you can safely access all of the elements that you requested (i.e. for
> new T[n], you can access indexes 0..n-1).
> 3. assuming you haven't violated #2, that you can pass the same pointer
> returned by new[] to delete[].

Presumably you can also assume that the memory will be contigious?
(Thus, T[n] = T + n)
I think I was just under the false impression that even though I'd
discovered
that I wasn't allocating enough space, the fact that I happened to have got
more
meant that this couldn't be the problem. I'm getting the image that it would
be
undefined behaviour anyway.

>
> Then there's likely to valid reason to write it in unmanaged C++.

If you mean write the whole lot in unmanaged C++, no - I don't want
to do that. The reason simply being that it would take me far too long.
The algorithms are what should be taking my programming time, not
spending hours writing code to display a user interface.

> You're
> apparently operating under the falacious assumption that managed code is
> slow, or that this function is going to be a bottleneck in your program
> (have you profiled it to find out?).

No, I'm not saying managed code is slow. I'm just saying it's a known fact
that it's slightly slowER, than unmanamaged C++ code.
If this program was being developed commercially, then some bod with
a degree in design architecture and who never actually has to write any code
would make a decision about which bits are going to be written in which
language. I figured that since I'm more of a "just do it" programmer
(both in profession and in hobby) I should take this step myself, rather
than
simply writing it all in the same language.

>
> So? Users typing keys are monumentally slow - you can run 10's (maybe
> 100's) of millions of CPU instructions between keystrokes on a modern CPU.

Not the speed I type at (50 - 70 wpm). Which yes, ~1Hz is slow in
electronics terms
you're right. But I don't think that I want to be cutting any slack
nevertheless.


> Besides, it's likely that the std::string solution, if properly written,
> will be the same speed as your fragile hand-crafted solution.

I doubt it'll be fragile, since it'll be tested with all possible inputs
(and checked for memory leaks if I'm feeling pedantic). I wouldn't have
thought that writing with a class library that I have no knowledge of
and that is generic enough to handle many different scenarios, would be as
fast as writing with native functions that I do have knowledge of and that
are specifically only programmed to do the task I have in mind.

> find_first_not_of is a member function of std::basic_string<CharT> - note
> the . between mystring and find_first_not_of in the above sample. It
comes
> not from the tooth fairy, nor from Microsoft, but from the ISO C++
standard.

oh ok, I stand corrected then. Although the STL is made by hewlett-packard,
you realise. But under the hood it's still probably a similar algorithm to
_tcscspn, and is just another header file of mainly unnecessary
bumph compiled into the application.


>
> -cd
>
>



Re: size_t / long by Carl

Carl
Tue May 11 17:38:53 CDT 2004

songie D wrote:
> oh ok, I stand corrected then. Although the STL is made by
> hewlett-packard, you realise. But under the hood it's still probably
> a similar algorithm to _tcscspn, and is just another header file of
> mainly unnecessary
> bumph compiled into the application.

The STL was a proposal by Alex Stepanov (et al) of Hewlett Packard to the
C++ standards committee. The C++ standard incorporates the components
originally included in STL. Note that std::string was not part of the STL
proposal, but rather was created by the C++ committee based on other
proposals. In fact, the string class was in the proposed standard before
STL was proposed, with a number of changes being made to the string class
after STL was introduced to make the string more "STL-like".

And yes, find_first_not_of, under the covers, is no doubt a very similar
algorithm to _tcscspn, but it's standard and portable.

-cd



Re: size_t / long by anonymous

anonymous
Wed May 12 03:11:04 CDT 2004

It'd be interesting to know what the internal reaso
behind why it failed *even though* it had given m
more memory than I had asked for.

Re: size_t / long by Simon

Simon
Wed May 12 11:50:07 CDT 2004

"songie D" <anonymous@discussions.microsoft.com> wrote in message
news:276AD83F-0B5A-42EB-B50A-3A711F6F7E55@microsoft.com...
> It'd be interesting to know what the internal reason
> behind why it failed *even though* it had given me
> more memory than I had asked for.

Because you hadn't null terminated the string. It was nothing to do with
delete[], it was to do with your stuff that you replaced in the post with
"...<do some operations on the 'word' variable>"/

S.



Re: size_t / long by songie

songie
Wed May 12 14:35:09 CDT 2004

The 'operations' were largely reading from the word variable, not
writing to it. I was only confused because I actually HAD null-terminated
the string, in a position that judging by the memory window it appeared to
have
allocated, other than that I would have to change it anyway, I didn't think
this would fix the problem, as it didn't happen to have mattered, but it
seems it was being pedantic in some way. And yes it did fix the problem.
So thanks

"Simon Trew" <noneofyour@business.guv> wrote in message
news:O5uLiEEOEHA.1644@TK2MSFTNGP09.phx.gbl...
> "songie D" <anonymous@discussions.microsoft.com> wrote in message
> news:276AD83F-0B5A-42EB-B50A-3A711F6F7E55@microsoft.com...
> > It'd be interesting to know what the internal reason
> > behind why it failed *even though* it had given me
> > more memory than I had asked for.
>
> Because you hadn't null terminated the string. It was nothing to do with
> delete[], it was to do with your stuff that you replaced in the post with
> "...<do some operations on the 'word' variable>"/
>
> S.
>
>



Re: size_t / long by Simon

Simon
Wed May 12 19:28:29 CDT 2004

"songie D" <songie@D.com> wrote in message
news:%23oyc7gFOEHA.808@tk2msftngp13.phx.gbl...
> The 'operations' were largely reading from the word variable, not
> writing to it.

So what? If it is not properly nul terminated (see later) then reading past
the end of allocated or properly initialised memory can cause nasty things
to happen. FWIW, the little boxes you saw at the end of the string in the
debugger were not necessarily the nul terminator. They were any character
that GDI couldn't render in the debugger's font, and most likely these were
not nul, but junk.

I was only confused because I actually HAD null-terminated
> the string, in a position that judging by the memory window it appeared to
> have allocated, other than that I would have to change it anyway,

The nul terminator is to indicate the end of the string, it has nothing to
do with how much memory is allocated to that string. F'rexample:

char bert[100] = "x";

The nul terminator is now at x[1] and strlen(bert) returns 1 but
sizeof(bert) is definitely 100.

S.