Properties have always been a subject of numerous discussions and arguments, and I am not the one to resolve all of them. I am here to suggest to you an approach which I like and use.
Problems of parameterless methods
Even though I have no objective to fight mutability, I only consider immutable objects for this article.
Consider a type Integer
defined as follows:
public sealed record Integer(int Value);
It is some artificial abstract object. For example, in math every integer number inevitably has its tripled value. In today's world many people would still define it a "classical" way with a method, which computes the tripled value and returns it:
public sealed record Integer(int Value)
{
public Integer Triple() => new Integer(Value * 3);
}
Now, let me show the downsides of this fairly naive approach. That is a piece of code addressing this method:
public int SomeMethod(Integer number)
{
var tripled = number.Triple();
if (tripled.Value > 5)
return tripled.Value;
else
return 1;
}
In the fourth and fifth line we address the tripled's Value. In fact, I could have written it in one line, but then I would have to call Triple twice, which might hurt the performance significantly
{ Value: > 5 }
You might know that in C#9 with all these advanced pattern matching techniques you can write number.Triple() is { Value: > 5 } res ? res : 1
, and it is true.
However, you have to agree that you cannot guarantee how many times this method is going to be called. You will need to cache your SomeMethod by creating an additional table of values, otherwise, it may be called multiple times. In either case the performance and readability and maintainability of your code is likely to be negative affected.
Now, property's move. That is what it should look like:
public int SomeMethod(Integer number)
=> number.Tripled.Value > 5 ? number.Tripled.Value : 1;
As you can see, I address the Tripled
property without any worry that I call it too many times. In fact, this property will return the same object, but it will spend time and memory only when it is addressed the first time.
As a code's user, you should not have to care about the cost of performance of such properties. But you would if it was a method, performing an active action (that is, an action which guaranteedly consumes some remarkable CPU time or memory).
What is the solution?
First of all, as a type's author, I must care about the experience of the programmer using my type. I never want to call some outside library's methods too many times, so as a user, I would cache them somewhere ruining my code. What I should see instead are properties which I never want to cache, let them be fields for me. Looks like a field, acts like a field.
Second, now we need to guarantee that our property is indeed not an active action. Internally, it will call the binded method, but only once. You may think of it as of a lazy unnecessary initialization. Because the object itself is responsible for this property, the user cannot check whether the property has already been initialized or not, which helps to avoid a potential for useless optimizations.
It sounds bad that the user is limited
With those properties, when addressing one you cannot tell whether it will return immediately or consume a few milliseconds to process.
However, if it was not a property, it would be a method which you would still call. The first time the property will be as expensive as calling a method, so at least you do not lose performance.
If you could check whether a property is initialized AND would change the behaviour of your code depending on it, there is probably something very strange with your code.
Third, I still care about my own comfort, given that I am the designer of my type. I cannot have decorators in C#, at most I have source generators, which, as they cannot modify the code, seem useless for this.
What I found the most convenient is having a private field next to every public property where your cache would be stored. Not only that, the initializer method should not be in the cache field's constructor (because you cannot address fields of a type when initializing a new field). Instead, it must be in the property (so that we could address type's fields).
That is it about this pattern. Now I am going to share my thoughts about the implementation of this pattern in real life.
Down into details
Here I am going to cover my thoughts rather than find the best way of implementing this pattern.
Approach 1. Fake property:
public sealed record Number(int Value)
{
public int Number Tripled => new Number(@this.Value * 3);
}
To me it looks like an anti-pattern. Imagine profiling your code and then discover that addressing a property consumed so much CPU time and RAM, despite that a property syntactically looks the same as a field. Instead, if you do not want to cache it, use a method.
Now we are going to cover approaches with permanent caching/lazy initialization.
Approach 2. Lazy<T>:
public sealed record Number : IEquatable<Number>
{
public int Value { get; init; } // we have to overload the ctor, so it anyway makes no sense to put it as a record's parameter
public int Number Tripled => tripled.Value;
private Lazy<Number> tripled;
public Number(int value)
{
Value = value;
tripled = new(() => value * 3); // we cannot do it when assigning a field because you cannot address other fields from a field's initialization
}
// because Equals, generated for records, is generated based on its fields, hence, Lazy<T> affects the comparison
public bool Equals(Number number) => Value == number.Value;
// same logic with GetHashCode
public override int GetHashCode() => Value.GetHashCode();
}
To me it looks awful. Not only that we now cannot use the records' features like auto-implemented Equals
and GetHashCode
, I barely can imagine adding a new property, because this would only increase the "entropy" of this messy code. Also, every time you add a field which has to affect the equality/hash code, you will need to add it to both Equals
and GetHashCode
.
As we can see, we have to put the Lazy<T>
's initialization in a place different from the field. That is, while declaring the field in one place, we assign to it in way other place (in the constructor). Assume you want to add a new property, then you will need to add a lazy somewhere, and its initialization in the constructor.
Another problem with this code is the with
operator, which clones all your fields aside from those you explicitly reassign. This operator will clone your Lazy<T>
field as well, despite that it was only valid for your first instance, but might be not for new values of fields. This implies that with
should also be overriden, and every time you add a real field, which should be copied, you will have to add it to the override as well.
Approach 3. Using ConditionalWeakTable
:
public sealed record Number(int Value)
{
public Number Tripled => tripled.GetValue(this, @this => new Integer(@this.Value * 3));
private static ConditionalWeakTable<Number, Number> tripled = new();
}
It looks fairly concise. In terms of design it implements what I want to see: a private field carring a lambda-initalizer, which will "hit" once the property is addressed the first time.
There are a couple of minor problems. First, it only accepts reference types, so you will need to wrap a ValueType
with a class or record. Second, it is also a bit slower than it could be, especially when using primitive types (takes 6x as much time as my naive implementation).
My naive implementation
I only provide this for the sake of completeness of the information, so it is more of an example of what it could look like.
Let me consider the key points:
This private container will be a struct (because why having another unnecessary allocation?)
Equals
andGetHashCode
will now returntrue
and 0 respectively. Although it is a workaround, this fairly simple trick allows us to avoid overriding these two methods. That is, you will still have a correct comparison and hash code with Roslyn's generatedEquals
andGetHashCode
even if you have private fields dedicated to caches.Let there be any type for <T> (unlike what we had in Approach 3). We are going to lock the field's holder, not the field itself.
We will pass the factory in the property itself, so that we could address any field without needing to override the contructor (like we had to in Approach 2).
When internally checking whether our property is initalized, we shall compare references of holders. If, say, you applied the
with
operator, even though this private field is copied along with others, your property will be reinitialized once addressed the first time in the new instance.
I called my struct for permanent caching/lazy initialization as FieldCache
. That is what its fields look like:
public struct FieldCache<T> : IEquatable<FieldCache<T>>
{
private T value;
private object holder; // we will only compare references, hence there is no need to make it generic
// like I said earlier, to avoid affects of the field on Equals and GetHashCode we make it permanently true and 0
public bool Equals(FieldCache<T> _) => true;
public override int GetHashCode() => 0;
}
Now, that is what a naive implementation of the method GetValue
looks like:
public struct FieldCache<T> : IEquatable<FieldCache<T>>
{
public T GetValue<TThis>(Func<TThis, T> factory, TThis @this)
// record is class internally. We need the holder to
// be a reference type so that we could safely compare by it
where TThis : class
{
// if the holder's reference has changed or is null
if (!ReferenceEquals(@this, holder))
lock (@this)
{
if (!ReferenceEquals(@this, holder))
{
// we pass this to the factory, so that the
// property using FieldCache could address
// local properties/fields or methods
// without the need to recreate a capturing lambda
value = factory(@this);
holder = @this;
}
}
return value;
}
}
Now, that is how my type is designed a the end:
public sealed record Number(int Value)
{
public Number Tripled => tripled.GetValue(@this => new Number(@this.Value * 3), this);
private FieldCache<Number> tripled;
}
It is significantly faster than ConditionalWeakTable, although slower than Lazy<T>:
Method | Mean |
BenchFunction | 4600 ns |
Lazy<T> | 0.67 ns |
FieldCache<T> | 3.67 ns |
ConditionalWeakTable | 25 ns |
A real world example?
UPD: thanks to BkmzSpb's recommendation, this example was added.
I used this pattern in a symbolic algebra library I work on. Every mathematical expression has Evaled
, InnerSimplified
, and a few other properties which only depend on the expression itself. They get addressed numerous times in methods like Simplify()
, Solve()
, Integrate()
and others.
That is why it was decided to find a way to cache them. One project's contributor moved them to caching via ConditionalWeakTable
, which by far improved the performance. Then, I implemented FieldCache
, which also gave a significant boost.
That is a performance report for methods, whose performance was affected by lazy properties:
The impact is unimaginable.
In conclusion
If, in your immutable type, you have a method with no parameters but some computations, you may want to replace it with a cacheable property to make sure that it is called once at most, as well as encapsulate those computations into a field-like looking property.
Nonetheless, you may experience problems with the existing approaches, so the problem, in a sense, remains open.
Like I said, I do use this pattern, so I implemented it for my projects. The source code is available on GitHub. I do not pose it as the best solution, so instead, you are likely to develop your own type, or take an existing one which fits your needs.
The main point of this article - use cacheable properties instead of parameterless methods for immutable objects.
Thank you for your attention, I hope I helped some people to reconsider their view on this problem.
WhiteBlackGoose Автор
About use cases. In my project, a symbolic algebra library, I work with mathematical expressions. At the beginning of the project, I had method Evaluate(). Every time I called it I knew that it would take some time to process, so I wouldn't call it multiple times.
However, it's hard to track how many times it is called for an expression, there many methods, such as Simplify(), Solve() and many others which used this method. In fact, there were many cases when the same expression's Evaluate() was called.
With implementation of this pattern, I don't hesitate to address Evaluated property. This helps a lot when designing/developing new functional and makes the code faster and safer.
BkmzSpb
It really depends on the use cases. Personally, I would avoid introducing additional fields in types that are designed to be primitives. For instance, if I implement
Int128
using 4UInt32
, the struct is 'blittable' and can be easily moved and copied in memory. Another important point is if for each instance of some type there may be several derived instances (like for eachint
you have3 * int
), the memory consumption of caching can explode if instances are distributed uniformly in some way (forint
it means regular uniform distribution). If you have an array ofYourInteger
from 0 to 128, which you iterated over, cached and obtainedTripple
, you have twice the memory footprint.Perhaps the problem can be solved by some smart caching (e.g., depending on context), but I doubt there will be any benefit for small workloads (evaluating the same thing several times can be faster than doing all the fance caching). I can imagine that if you attempt to compute
n
terms of a series, and each term depends on, say,x
,x^2
,x^3
, then forn > 10
(arbitrary limit) cachingx^i
could be beneficial.WhiteBlackGoose Автор
Thank you for your answer. I should mention that here I consider a more or less high level of programming.
Yes, that is a good point. But it is relevant for low-level programming. Also, struct is struct, here I'm talking more about records, which are reference types, and copying them by memory is… well, you know, unsafe and prohibited.
If you address their Tripled property — yes, it is so. But you rarely need to process huge arrays of high-level objects (if you do, there might be something wrong with the architecture).
You are about right, though the way it works in my case is that I frequently need the evaluated form of the same expression in many complicated methods.
So, the key point is: the pattern is more for immutable high-level objects rather than for super-fast calculations of primitive structures.
BkmzSpb
Then there is a very simple way to convince people that your approach is the best: take an example out of your domain (something that is more complicated than
3 * x
) and benchmark naive computations vs caching on a typical workload size. The table you posted is not that informative to me, or perhaps I misread it. But a good summary table next to benchmarked code snippets will help a lot.WhiteBlackGoose Автор
I have this table (comparing the 914th and the 920th commits). Thank you for your advice, I will insert it in the article (but i need to add the first column). Although the impact might really vary from project to project
WhiteBlackGoose Автор
The full table added.