Why PVS-Studio Doesn't Offer Automatic Fixes

Static analyzer PVS-Studio can detect bugs in pretty complex and intricate parts of code, and coming up with appropriate fixes for such bugs may be a tough task even for human developers. That's exactly the reason why we should avoid offering any options for automatic fixing at all. Here are a couple of examples.

Those who are only getting started with PVS-Studio sometimes wonder why it doesn't offer to fix bugs automatically. Interestingly, the regular users don't ask this question. As you gain experience working with the analyzer, it becomes clear that automatic replacement can't be applied to most bugs. At least not until we have full-fledged artificial intelligence :).

Such replacement would be possible if PVS-Studio analyzed coding style. But that's not what it is designed to do. It doesn't offer formatting or naming edits. It doesn't offer (at least as of this writing :) automatic replacement of all NULLs with nullptrs in C++ code. Good as it is, such an edit has little to do with search and elimination of bugs.

Instead, PVS-Studio's job is to detect bugs and potential vulnerabilities. In many cases, fixing them requires a creative approach and changing the program's behavior. Only the human developer can decide upon the appropriate way of fixing a given bug.

The most likely suggestion you'd get from the analyzer when it detects a defect is to simplify the code to make the anomaly go away, but that wouldn't be enough to eliminate the defect itself. Yet figuring out what exactly the code is intended to do and coming up with a sensible and useful fix is too difficult a job.

As an example, here's a bug discussed in my article "February 31".

static const int kDaysInMonth[13] = {
  0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};

bool ValidateDateTime(const DateTime& time) {
  if (time.year < 1 || time.year > 9999 ||
      time.month < 1 || time.month > 12 ||
      time.day < 1 || time.day > 31 ||
      time.hour < 0 || time.hour > 23 ||
      time.minute < 0 || time.minute > 59 ||
      time.second < 0 || time.second > 59) {
    return false;
  }
  if (time.month == 2 && IsLeapYear(time.year)) {
    return time.month <= kDaysInMonth[time.month] + 1;
  } else {
    return time.month <= kDaysInMonth[time.month];
  }
}

The analyzer realizes that both checks evaluate to true. But it doesn't know why. It knows nothing about days, months, and other entities. And you'd have a very hard time trying to teach those things to it. The only thing you can possibly teach it to do is to offer to simplify the function:

bool ValidateDateTime(const DateTime& time) {
  if (time.year < 1 || time.year > 9999 ||
      time.month < 1 || time.month > 12 ||
      time.day < 1 || time.day > 31 ||
      time.hour < 0 || time.hour > 23 ||
      time.minute < 0 || time.minute > 59 ||
      time.second < 0 || time.second > 59) {
    return false;
  }
  if (time.month == 2 && IsLeapYear(time.year)) {
    return true;
  } else {
    return true;
  }
}

Well, why stop at that? Let's have the analyzer apply the following fix:

bool ValidateDateTime(const DateTime& time) {
  if (time.year < 1 || time.year > 9999 ||
      time.month < 1 || time.month > 12 ||
      time.day < 1 || time.day > 31 ||
      time.hour < 0 || time.hour > 23 ||
      time.minute < 0 || time.minute > 59 ||
      time.second < 0 || time.second > 59) {
    return false;
  }
  return true;
}

It's funny but it misses the point. The analyzer has removed the portion of code that's considered redundant from the viewpoint of the C++ language. Yet only the human developer can determine whether the code is indeed redundant (which is very often the case) or contains a typo and month must be replaced with day.

You may say that I'm dramatizing things and that automatic replacement is a viable option. No, it's not. Even we humans make mistakes trying to figure such issues out – how can we expect better judgement from an inanimate computer program? Here's an interesting example of a manual careless fix that actually doesn't fix anything. If the human fails, the machine will surely fail too.

In August of this pandemic year, I posted an article covering the issues found in the PMDK library. Among other defects, I discussed one bug that compromised overflow protection:

static DWORD
get_rel_wait(const struct timespec *abstime)
{
  struct __timeb64 t;
  _ftime64_s(&t);
  time_t now_ms = t.time * 1000 + t.millitm;
  time_t ms = (time_t)(abstime->tv_sec * 1000 +
    abstime->tv_nsec / 1000000);

  DWORD rel_wait = (DWORD)(ms - now_ms);

  return rel_wait < 0 ? 0 : rel_wait;
}

Since the rel_wait variable is unsigned, the subsequent check rel_wait < 0 is pointless. PVS-Studio's diagnostic message: V547 [CWE-570] Expression 'rel_wait < 0' is always false. Unsigned type value is never < 0. os_thread_windows.c 359

The article inspired somebody to do mass fixing of the bugs it mentioned: Fix various issues reported by PVS-Studio analysis.

What fix do you think they suggested? Quite a plain one: core: simplify windows timer implementation.

image2.png

But it only simplifies the code, not fixes it! Somebody else noticed this and opened a discussion: ISSUE: os_thread_windows.c — get_rel_wait() will block if abstime is in the past.

As you can see, even humans make mistakes when trying to come up with a fix. Machines are just hopeless in that respect.

Actually, when you come to think of it, the wish for bugs to get fixed automatically is quite an odd one. Every fix demands care and close inspection of the code. Besides, a warning may turn out to be a false positive, in which case it must not be touched at all. Code analysis and bug fixing don't tolerate haste. A better strategy is to run analysis regularly and fix freshly introduced bugs.

Комментарии (0)