Picture 7

Another year is drawing to an end, and it's a perfect time to make yourself a cup of coffee and reread the reviews of bugs collected across open-source projects over this year. This would take quite a while, of course, so we prepared this article to make it easier for you. Today we'll be recalling the most interesting dark spots that we came across in open-source C/C++ projects in 2019.

No. 10. What operating system are we running on?


V1040 Possible typo in the spelling of a pre-defined macro name. The '__MINGW32_' macro is similar to '__MINGW32__'. winapi.h 4112

#if !defined(__UNICODE_STRING_DEFINED) && defined(__MINGW32_)
#define __UNICODE_STRING_DEFINED
#endif

There is a typo in the name of the __MINGW32_ macro (MINGW32 is actually declared by __MINGW32__). Elsewhere in the project, the check is written correctly:

Picture 3


By the way, this bug was not only the first to be described in the article "CMake: the Case when the Project's Quality is Unforgivable" but the very first genuine bug found by the V1040 diagnostic in a real open-source project (August 19, 2019).

No. 9. Who's first?


V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. mir_parser.cpp 884

enum Opcode : uint8 {
  kOpUndef,
  ....
  OP_intrinsiccall,
  OP_intrinsiccallassigned,
  ....
  kOpLast,
};

bool MIRParser::ParseStmtIntrinsiccall(StmtNodePtr &stmt, bool isAssigned) {
  Opcode o = !isAssigned ? (....)
                         : (....);
  auto *intrnCallNode = mod.CurFuncCodeMemPool()->New<IntrinsiccallNode>(....);
  lexer.NextToken();
  if (o == !isAssigned ? OP_intrinsiccall : OP_intrinsiccallassigned) {
    intrnCallNode->SetIntrinsic(GetIntrinsicID(lexer.GetTokenKind()));
  } else {
    intrnCallNode->SetIntrinsic(static_cast<MIRIntrinsicID>(....));
  }
  ....
}

We are interested in the following part:

if (o == !isAssigned ? OP_intrinsiccall : OP_intrinsiccallassigned) {
  ....
}

The precedence of the '==' operator is higher than that of the ternary operator (?:). Therefore, the conditional expression is evaluated in the wrong order and is equivalent to the following code:

if ((o == !isAssigned) ? OP_intrinsiccall : OP_intrinsiccallassigned) {
  ....
}

Since the constants OP_intrinsiccall and OP_intrinsiccallassigned are non-null, the condition will be returning true all the time, which means the body of the else branch is unreachable code.

This bug was described in the article "Checking the Ark Compiler Recently Made Open-Source by Huawei".

No. 8. Dangerous bitwise operations


V1046 Unsafe usage of the bool' and 'int' types together in the operation '&='. GSLMultiRootFinder.h 175

int AddFunction(const ROOT::Math::IMultiGenFunction & func) {
  ROOT::Math::IMultiGenFunction * f = func.Clone();
  if (!f) return 0;
  fFunctions.push_back(f);
  return fFunctions.size();
}

template<class FuncIterator>
bool SetFunctionList( FuncIterator begin, FuncIterator end) {
  bool ret = true;
  for (FuncIterator itr = begin; itr != end; ++itr) {
    const ROOT::Math::IMultiGenFunction * f = *itr;
    ret &= AddFunction(*f);
  }
  return ret;
}

The code suggests that the SetFunctionList function traverses an iterator list. If at least one iterator is invalid, the function returns false, or true otherwise.

However, the SetFunctionList function can return false even for valid iterators. Let's figure out why.The AddFunction function returns the number of valid iterators on the fFunctions list. That is, adding non-null iterators will cause the list to incrementally grow in size: 1, 2, 3, 4, and so on. This is where the bug comes into play:

ret &= AddFunction(*f);

Since the function returns a value of type int rather than bool, the '&=' operation will return false for even values because the least significant bit of an even number is always set to zero. This is how one subtle bug can break the return value of SetFunctionsList even when its arguments are valid.

If you were reading the snippet carefully (and you were, weren't you?), you could have noticed that it came from the project ROOT. Yes, we checked it too: "Analyzing the Code of ROOT, Scientific Data Analysis Framework".

No. 7. Variables mixed up


V1001 [CWE-563] The 'Mode' variable is assigned but is not used by the end of the function. SIModeRegister.cpp 48

struct Status {
  unsigned Mask;
  unsigned Mode;

  Status() : Mask(0), Mode(0){};

  Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
    Mode &= Mask;
  };
  ....
};

It's very dangerous to use the same names for function arguments as for class members because you risk mixing them up. And that's exactly what happened here. The following expression doesn't make sense:

Mode &= Mask;

The function's argument changes, and that's it. This argument is not used in any way after that. What the programmer really wanted to write was probably the following:

Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
  this->Mode &= Mask;
};

This bug was found in LLVM. We have a tradition to check this project every now and then. This year we checked it one more time.

No. 6. C++ has its own laws


This bug stems from the fact that C++ rules don't always follow mathematical rules or «common sense». Look at the small snippet below and try to find the bug yourself.

V709 Suspicious comparison found: 'f0 == f1 == m_fractureBodies.size()'. Remember that 'a == b == c' is not equal to 'a == b && b == c'. btFractureDynamicsWorld.cpp 483

btAlignedObjectArray<btFractureBody*> m_fractureBodies;

void btFractureDynamicsWorld::fractureCallback()
{
  for (int i = 0; i < numManifolds; i++)
  {
    ....
    int f0 = m_fractureBodies.findLinearSearch(....);
    int f1 = m_fractureBodies.findLinearSearch(....);

    if (f0 == f1 == m_fractureBodies.size())
      continue;
    ....
  }
....
}

The condition seems to be checking that f0 is equal to f1 and is equal to the number of elements in m_fractureBodies. It was probably meant to check if f0 and f1 are located at the end of the m_fractureBodies array since they contain an object position found by the findLinearSearch() method. But in reality, this conditional expression checks if f0 is equal to f1 and then if m_fractureBodies.size() is equal to the result of the expression f0 == f1. That is, the third operand here is checked against 0 or 1.

That's a nice bug! And, fortunately, a pretty rare one. So far we have seen it only in three open-source projects, and, interestingly, all the three were game engines. This is not the only bug found in Bullet; the most interesting ones were described in the article "PVS-Studio Looked into the Red Dead Redemption's Bullet Engine".

No. 5. What's at the end of the line?


This one is easy if you know one tricky detail.

V739 EOF should not be compared with a value of the 'char' type. The 'ch' should be of the 'int' type. json.cpp 762

void JsonIn::skip_separator()
{
  signed char ch;
  ....
  if (ch == ',') {
    if( ate_separator ) {
      ....
    }
    ....
  } else if (ch == EOF) {
  ....
}

This is one of those bugs that you can't easily spot if you don't know that EOF is defined as -1. So, if you try to compare it with a variable of type signed char, the condition will almost always be false. The only exception is the character encoded as 0xFF (255). When compared with EOF, this character will turn into -1, thus making the condition true.

A lot of bugs in this year's Top 10 were found in computer gaming software: engines or open-source games. As you already guessed, this one came from that area too. More errors are described in the article "Cataclysm Dark Days Ahead: Static Analysis and Roguelike Games".

No. 4. The magic constant Pi


V624 There is probably a misprint in '3.141592538' constant. Consider using the M_PI constant from <math.h>. PhysicsClientC_API.cpp 4109

B3_SHARED_API void b3ComputeProjectionMatrixFOV(float fov, ....)
{
  float yScale = 1.0 / tan((3.141592538 / 180.0) * fov / 2);
  ....
}


There's a tiny typo in the Pi number (3,141592653...): the number «6» is missing at the 7th decimal place.

Picture 4
An incorrect one-millionth decimal digit would hardly cause any noticeable harm, but it's still better to use existing constants from libraries, whose correctness is guaranteed. The Pi number, for instance, is represented by the constant M_PI from the header math.h.

You already read about this bug in the article "PVS-Studio Looked into the Red Dead Redemption's Bullet Engine", where it was placed sixth. If you haven't read it yet, this is your last chance.

A small diversion


We are approaching the Top 3 most interesting bugs. As you have probably noticed, I'm sorting the bugs not by their impact but by the effort it takes a human reviewer to find them. After all, the advantage of static analysis over code reviews is basically the inability of software tools to get tired or forget things. :)

Now, let's see what we have in our Top 3.

Picture 8


No. 3. An elusive exception


V702 Classes should always be derived from std::exception (and alike) as 'public' (no keyword was specified, so compiler defaults it to 'private'). CalcManager CalcException.h 4

class CalcException : std::exception
{
public:
  CalcException(HRESULT hr)
  {
    m_hr = hr;
  }
  HRESULT GetException()
  {
    return m_hr;
  }
private:
  HRESULT m_hr;
};

The analyzer has detected a class derived from the std::exception class using the private modifier (which is used by default if not specified otherwise). The problem with this code is that an attempt to catch a generic std::exception will cause the program to miss an exception of type CalcException. This behavior stems from the fact that private inheritance forbids implicit type conversion.

You definitely wouldn't like to see your program crash because of a missed public modifier. By the way, I bet you have used this application at least once in your life because it's the good old Windows Calculator, which we also checked earlier this year.

No. 2. Unclosed HTML tags


V735 Possibly an incorrect HTML. The "</body>" closing tag was encountered, while the "</div>" tag was expected. book.cpp 127

static QString makeAlgebraLogBaseConversionPage() {
  return
    BEGIN
    INDEX_LINK
    TITLE(Book::tr("Logarithmic Base Conversion"))
    FORMULA(y = log(x) / log(a), log<sub>a</sub>x = log(x) / log(a))
    END;
}

As it often happens, C/C++ source code doesn't say much by itself, so let's take a look at the preprocessed code generated from the snippet above:

Picture 6


The analyzer has found an unclosed <div> tag. There are many html-code fragments here, so the authors need to revise it.

Surprised we can diagnose this kind of bugs? I was impressed too when I saw that for the first time. So, yes, we do know something about analyzing html code. Well, only if it's within C++ code. :)

Not only is this bug placed second but it's a second calculator on our Top 10 list. To learn what other bugs we found in this project, see the article "Following in the Footsteps of Calculators: SpeedCrunch".

No. 1. Elusive standard functions


Here's the bug placed first. This one is an impressively weird bug, which managed to make it through the code review.

Try to find it yourself:

static int
EatWhitespace (FILE * InFile)
  /* ----------------------------------------------------------------------- **
   * Scan past whitespace (see ctype(3C)) and return the first non-whitespace
   * character, or newline, or EOF.
   *
   *  Input:  InFile  - Input source.
   *
   *  Output: The next non-whitespace character in the input stream.
   *
   *  Notes:  Because the config files use a line-oriented grammar, we
   *          explicitly exclude the newline character from the list of
   *          whitespace characters.
   *        - Note that both EOF (-1) and the nul character ('\0') are
   *          considered end-of-file markers.
   *
   * ----------------------------------------------------------------------- **
   */
{
    int c;

    for (c = getc (InFile); isspace (c) && ('\n' != c); c = getc (InFile))
        ;
    return (c);
}                               /* EatWhitespace */

Now let's see what the analyzer has to say:

V560 A part of conditional expression is always true: ('\n' != c). params.c 136.

Weird, isn't it? Let's take a look at some other curious spot but in a different file (charset.h):

#ifdef isspace
#undef isspace
#endif
....
#define isspace(c) ((c)==' ' || (c) == '\t')

Hm, this is strange indeed… So, if the c variableis equal to '\n', then the seemingly harmless function isspace( c ) willreturn false,thus preventing the second part of the check from executing due to short-circuit evaluation. And if isspace( c ) executes, the c variablewill be equal either to ' ' or '\t', which is obviously not being equal to '\n'.

You could argue that this macro is similar to #define true false and code like that would never make it through a code review. But this particular snippet did – and was sitting in the repository waiting to be discovered.

For more detailed commentary on this bug, see the article "Wanna Play a Detective? Find the Bug in a Function from Midnight Commander".

Conclusion


Picture 9


We have found tons of bugs over this year. Those were common copy-paste mistakes, inaccurate constants, unclosed tags, and lots of other defects. But our analyzer is evolving and learning to diagnose more and more types of issues, so we are certainly not going to slow down and will be publishing new articles about bugs found in projects just as regularly as before.

Just in case you haven't read our articles before, all these bugs were found using our PVS-Studio static analyzer, which you are welcome to download and try on your own projects. It detects bugs in programs written in C, C++, C#, and Java.

You've finally got to the finish line! If you missed the first two levels, I suggest that you seize the opportunity and complete these levels with us: C# and Java.

Комментарии (2)


  1. Cerberuser
    21.12.2019 10:52

    2 forces me to ask, is there any C++-specific code involved in this check, or you simply need to get the generated string literal (and it doesn't really matter how)?

    P.S. Has anyone already joked about "Unicorn of Persia"? :)


    1. YMinaev
      23.12.2019 11:26
      +1

      We just get a string literal and search for HTML tags inside it. No specific C++ magic here, just good old pattern matching.

      P.S. Has anyone already joked about «Unicorn of Persia»? :)

      It's never late to be the first :)