PDA

View Full Version : POP QUIZ: Find the bug in the following code... (semi-OT)


aaaaa00
01-Aug-2004, 02:02
The code is C++.


class CFoo
{
public:
LPCWSTR GetText()
{
return m_sbstrText ? m_sbstrText : L"";
}
private:
CComBSTR m_sbstrText;
};

int main()
{
LPCWSTR pwszFoo = NULL;
CFoo Foo;

pwszFoo = Foo.GetText();

ASSERT( pwszFoo && wcscmp( pwszFoo, L"" ) == 0 );

return 0;
}


Hint 1: This code will crash some of the time. Sometimes it will succeed.

Hint 2: This code compiles and runs on Visual Studio. You need to include windows.h and atlbase.h in your headers. (You also need an assert macro defined.)

Hint 3: This bug is quite subtle. I only saw it after staring at it for quite some time.

Good interview question. :)

Humus
01-Aug-2004, 04:09
m_sbstrText is obviously never initialized.

aaaaa00
01-Aug-2004, 04:31
m_sbstrText is obviously never initialized.

m_sbstrText is an instance of CComBSTR.

The constructor for CComBSTR initializes the contents to NULL.

This is not the bug.

Humus
01-Aug-2004, 07:02
Bah, I read it wrong, as a pointer.

Xmas
01-Aug-2004, 10:08
http://www.devguy.com/fp/Tips/COM/bstr.htm
? operator doesn't work with CComBSTR

frost_add
01-Aug-2004, 10:11
I couldn't get this exact code to crash on my machine single time. The assertion is quite suspicious to me, although I don't know what it is _supposed_ to do - maybe you wanted it this way.

gokickrocks
01-Aug-2004, 10:16
i believe xmas got it right on the nose

something ive been wondering...does the tertiary and biconditionals produce the same assembly code when compiled? or is one more efficient than the other? ie: is the ? statement more efficient than if else statements, vice versa, or the same

frost_add
01-Aug-2004, 10:21
http://www.devguy.com/fp/Tips/COM/bstr.htm
? operator doesn't work with CComBSTR
Ooops, I didn't think of CComBstr for a single moment. Interesting read, thanks for the link. Might be useful one day.

aaaaa00
01-Aug-2004, 10:54
http://www.devguy.com/fp/Tips/COM/bstr.htm
? operator doesn't work with CComBSTR

More or less correct.

(expression) ? a : b always returns objects of the type of a.

So the line


return m_sbstrText ? m_sbstrText : L"";


causes the compiler to create a temporary CComBSTR if (m_sbstrText) resolves to FALSE (which it will, since it's constructed to contain NULL).

The temporary CComBSTR uses the constructor overload for a string, allocates, and then copies L"" into its private member.

Since the function returns LPCWSTR, the compiler then casts the temporary CComBSTR back into a BSTR, which in turn casts back into an LPCWSTR.

But when the scope of the function ends, the destructor for the temporary CComBSTR executes, which frees the internal member.

This results in the function returning a pointer to freed memory. :shock:

Which, depending on how unlucky you are, results in you reading garbage, or causing an AV.

nutball
01-Aug-2004, 11:32
(expression) ? a : b always returns objects of the type of a.

Yyyyukkk!!! Almost enough to make you want to program in perl :)

BRiT
01-Aug-2004, 18:46
First of all, the obvious bug is that it's written in C++ instead of something like Java. ;)

PeterT
01-Aug-2004, 19:55
Java is the scourge of mankind. :twisted:

Interesting quiz - although I frequently use obscure operators or ancient initialization rules to make my C++ programs more interesting to read I did not know this ternary op rule.
Shame that I was too late to participate.

Humus
01-Aug-2004, 23:07
Great java quote I saw the other day:

"Saying java is good because it works on all platforms is like saying anal sex is good because it works on all genders".
8)

Sage
03-Aug-2004, 12:34
Great java quote I saw the other day:

"Saying java is good because it works on all platforms is like saying anal sex is good because it works on all genders".
8)

hey, good point! i never thought of it that way... man Java ROCKS!

DemoCoder
03-Aug-2004, 19:39
Ah, the wonder of C++ automatic type conversion rules. This is the best exhibit of why C++ is poorly designed I have ever seen. I'll save this one to my harddrive.

If there was atleast a warning, you'd know the compiler is creating an object behind your back, an object which is unsafe for automatic type conversion.

NoteMe
03-Aug-2004, 20:13
Great java quote I saw the other day:

"Saying java is good because it works on all platforms is like saying anal sex is good because it works on all genders".
8)

Hehe...I sooo agree...:D...maybe they will kick me out as a teacher at our high school now..:D..I am a Java and a Algorithms in Java teacher..:D.."poo on my shoe"



Øyvind Østlund (http://www.noteme.com)

Mintmaster
03-Aug-2004, 21:40
I'm surprised there wasn't a compiler warning. I know very little if anything about CComBSTR, but "" is clearly a temporary to me. Besides simultaneous character array declaration and initialization, does a string literal ever not create a temporary?

I may be missing something, but I don't see why all that casting business has anything to do with the error. You're returning the address of a temporary, and I've definately had VC++ warn me about that before.

aaaaa00
04-Aug-2004, 11:57
For your own projects, the best solution for this particular class of problems is to flag your constructors with the "explicit" keyword. This prevents the compiler from automagically creating instances of your classes.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang/html/vcrefExplicit.asp

I'm surprised there wasn't a compiler warning.

This code compiles fine with /W4 /WX (which is the highest possible warning level, the default VC++ warning level is only /W3).

Just step through it in a debugger instead of speculating. You'll see what the compiler did with the crazy temporary CComBSTR firsthand.


I know very little if anything about CComBSTR, but "" is clearly a temporary to me. Besides simultaneous character array declaration and initialization, does a string literal ever not create a temporary?

I may be missing something, but I don't see why all that casting business has anything to do with the error. You're returning the address of a temporary, and I've definately had VC++ warn me about that before.

Constant string literals defined in the body of your program all get stuck into the global .rdata section of your .EXE by the compiler, they are not temporaries. Hence the compiler will not warn you about returning a temporary.

.rdata is a read-only section containing all the constants defined in your app. This is so if you have the same string literal multiple times in your app, the compiler only needs to keep a single copy around.

This is why code like this:


WCHAR* pwszFoo = L"abcde"; // declare a pointer to a string literal, perfectly legal
wprintf( pwszFoo ); // works just fine
pwszFoo[0] = L'!'; // now try to reset the first character to a !


is wrong, and will cause an AV: your app is trying to write a read-only section, and the OS will say "nuh-uh, can't do that, buh-bye", and your app will go poof.

(On some OSes without MMU support or good memory protection, the write may succeed, meaning you'll have corrupted memory. Your program will appear to work for a while, but you may have trashed your constants, which may or may not cause a mystifying crash sometime later on.)

Do a "link /dump /all foo.exe" and you will be able to see the names and contents of all the sections in your .EXE.

See http://msdn.microsoft.com/msdnmag/issues/02/02/PE/default.aspx
and http://msdn.microsoft.com/msdnmag/issues/02/03/PE2/default.aspx if you want more detail.

The problem is C++ is a very rich language, with very few safeguards built in, and an infinite number of ways to shoot yourself in the foot. With thermo-nuclear weapons.

The only solution is to either not use C++, or understand what's really going on underneath the covers in the compiler.

A lot of people try to use the object oriented features of C++ without understanding what's going on underneath, which inevitably leads to problems. A guy I know calls code written this way "object disoriented programming". :D

tb
05-Aug-2004, 03:12
WCHAR* pwszFoo = L"abcde"; // declare a pointer to a string literal, perfectly legal
wprintf( pwszFoo ); // works just fine
pwszFoo[0] = L'!'; // now try to reset the first character to a !



Am I the only one, who never ever used WCHAR *pwszFoo = L"abcde" but WCHAR pwszFoo[] = L"abcde" , WCHAR *pwszFoo = NULL and WCHAR *pwszFoo = pwszFoo2; ....
I'am doing C/C++ for 10 years now....

Thomas

darkblu
05-Aug-2004, 05:08
while on the subject


It is dangerous to use pointers or references to portions of a temporary object. The compiler may very well delete the object before you expect it to, leaving a pointer to garbage. The most common place where this problem crops up is in classes like the libg++ String class, that define a conversion function to type char * or const char *. However, any class that returns a pointer to some internal structure is potentially subject to this problem.

For example, a program may use a function strfunc that returns String objects, and another function charfunc that operates on pointers to char:

String strfunc ();
void charfunc (const char *);

In this situation, it may seem natural to write `charfunc (strfunc ());' based on the knowledge that class String has an explicit conversion to char pointers. However, what really happens is akin to `charfunc (strfunc ().convert ());', where the convert method is a function to do the same data conversion normally performed by a cast. Since the last use of the temporary String object is the call to the conversion function, the compiler may delete that object before actually calling charfunc. The compiler has no way of knowing that deleting the String object will invalidate the pointer. The pointer then points to garbage, so that by the time charfunc is called, it gets an invalid argument. Code like this may run successfully under some other compilers, especially those that delete temporaries relatively late. However, the GNU C++ behavior is also standard-conforming, so if your program depends on late destruction of temporaries it is not portable.

If you think this is surprising, you should be aware that the ANSI C++ committee continues to debate the lifetime-of-temporaries problem.
For now, at least, the safe way to write such code is to give the temporary a name, which forces it to remain until the end of the scope of the name. For example:

String& tmp = strfunc ();
charfunc (tmp);


cheers

aaaaa00
05-Aug-2004, 07:29
WCHAR* pwszFoo = L"abcde"; // declare a pointer to a string literal, perfectly legal
wprintf( pwszFoo ); // works just fine
pwszFoo[0] = L'!'; // now try to reset the first character to a !



Am I the only one, who never ever used WCHAR *pwszFoo = L"abcde" but WCHAR pwszFoo[] = L"abcde" , WCHAR *pwszFoo = NULL and WCHAR *pwszFoo = pwszFoo2; ....
I'am doing C/C++ for 10 years now....

Thomas

They mean different things, and they are both valid.


WCHAR * pwszFoo = L"abcde";


means: declare a pointer to a wide char, and set its value equal to the address of the beginning of the global read-only constant L"abcde".

You may want to do this if you know you will never write to pwszFoo, and you want to avoid a copy initialization.

(And to be picky, you can turn off the string constant pooling behavior by removing the /Gf or /GF flags from your project in VC.)


WCHAR wszFoo[] = L"abcde";


means: declare an array of WCHAR of length equal to the length of the global read-only constant L"abcde", and copy the contents of that constant into the array.

This requires the compiler generate an implicit string copy, so it's more expensive than the previous example, but gets you a writable copy of the string.

Mintmaster
05-Aug-2004, 22:17
Thanks for the detailed reply, aaaaa00. I still say WCHAR * pwszFoo = L"abcde"; is bad style. One of the advantages (and even bigger disadvantage depending on the point of view) is that you take responsibility for everything in C++. Actually using the global .rdata directly in your program seems to be something only seasoned C++ users would know, and not good style IMO.

Colourless
06-Aug-2004, 08:38
(On some OSes without MMU support or good memory protection, the write may succeed, meaning you'll have corrupted memory. Your program will appear to work for a while, but you may have trashed your constants, which may or may not cause a mystifying crash sometime later on.)

WinNT wont allow it.
Win9x will allow it.