Rails Creator - Catching stragglers

Rails Creator - Catching stragglers

A mad-man's investigation

·

9 min read

Setting

This came up with a need to implement a creator to centralize creation of a model, instead of using Active Record Callbacks (due to associated complexity and prior issues with them). But here's the conundrum:

How do you know every instance of a creation is actually using the Creator Pattern?

This type of problem can be particularly hard to solve, since there are a few sneaky ways in which a model object can be created. Here's a sample, off the top of my head, from most obvious to hardest to find:

  1. Model.create/Model.create!
  2. Model.new followed by Model.save or Model.save!
  3. The elusive nested attributes. Particularly sneaky if you're not familiar with that bit of Rails magic.
  4. Storing the Model in a variable, and then running the creation methods against the variable. Albeit powerful, this can be particularly hard to discover, especially when coupled with Dependency Injection. E.g.:
    def self.creation(model_variable, attributes)
     model_variable.create!(attributes)  
    end
    

The Problem

We want to answer the following questions:

  1. Are there instances where our model creates an object without using the Creator?
  2. For any such instance, where is it located in the codebase?

Towards a solution

Glimpse

In what I could only describe as a fever dream, I suddenly had a glimpse of hope. What if the Active Record Callbacks, which initially caused us grief, could actually help us? This gives us an entry into every instance of a creation. But then...what?

Initial Approach

Once we are actually using the Active Record Callback, the new question to ask is "How do we know where we came from?". This is to answer our two questions:

  1. We want to identify whether the object is created by the Creator, which should be our "centralized pattern".
  2. If we're not coming from the Creator, then we want to know how we got there in order to report it.

Luckily, programming languages tend to keep track of the Stack Trace (particularly helpful when raising an error). And ruby exposes it.

Proof of Concept

Equipped with the trusty console, binding.pry and this wild idea, my journey began. I opted to create a very simple model (Primary) and started to test things out. My initial model looked as follows, with one after_create hook, simply connected to a binding.pry.

class Primary < ApplicationRecord
  after_create :test_location

  private

  def test_location
    binding.pry
  end

end

I then went into the console, created a new Primary and looked at the StackTrace. My first surprise was with how much of the trace was populated with rails activity, mostly from ActiveRecord and ActiveSupport. This meant that gems themselves would make the trace harder to parse, and it wouldn't be as simple as seeing what line we had right before the callback.

After some research, I discovered that ActiveSupport actually has a backtrace cleaner that can simplify this, although it will come at a performance cost. I thus updated my test_location method:

def test_location
    backtrace = Thread.current.backtrace
    bc = ActiveSupport::BacktraceCleaner.new
    bc.add_filter   { |line| line.gsub(Rails.root.to_s, '') }
    bc.add_silencer { |line| /puma|rubygems/.match?(line) }
    backtrace = bc.clean(backtrace)
    binding.pry
  end

Now, the backtrace was actually more succinct and cleaner.

Next, I investigated what the trace looked like from a call within the code. I created a service with a simple creator method and looked at the trace:

class PrimaryCreator
  def self.create_primary
    Primary.create!
  end
end

Here is the backtrace I got:

["/app/models/primary.rb:7:in `backtrace'",
 "/app/models/primary.rb:7:in `test_location'",
 "/app/services/primary_creator.rb:3:in `create_primary'",
 "(irb):1:in `<top (required)>'",
 "bin/rails:4:in `require'",
 "bin/rails:4:in `<main>'"]

Just as desired, we see right away where it was created. Now, as a simple improvement, let's update the test_location to test whether we're coming from the creator:

  def test_location
    backtrace = Thread.current.backtrace
    bc = ActiveSupport::BacktraceCleaner.new
    bc.add_filter   { |line| line.gsub(Rails.root.to_s, '') }
    bc.add_silencer { |line| !/primary_creator/.match?(line) }
    creator_used = bc.clean(backtrace).size.positive?
    binding.pry
  end

Let's now test this with a few more scenarios to test "creation leaks":

class PrimaryLeakAnalysis
  def self.leak
    primary_variable = Primary
    primary_variable.create!
  end

  def self.use_creator
    PrimaryCreator.create_primary
  end
end

Here, if we run PrimaryLeakAnalysis.leak, we'll get a false value for creator_used, whereas if we run PrimaryLeakAnalysis.use_creator, creator_used will be true.

Lastly, let's test the situation with nested attributes. For this, we create a Secondary model:

class Secondary < ApplicationRecord
  belongs_to :primary

  accepts_nested_attributes_for :primary
end

Let's also create a new method for this situation:

  def self.nested_leak
    params = { primary_attributes: {} }
    Secondary.create(params)
  end

Here, once we call this method, we once again see that our trace doesn't come from PrimaryCreator, and can trace it back to coming from PrimaryLeakAnalysis.nested_leak. However, it is important to note that the backtrace has no explicit indication that we're creating a Primary object through creation of a Secondary object.

Reporting

We now have a way to detect and identify these instances, what next? Well, that's up to you. We want to identify them and take an action on it. but there are different actions we could take, each with tradeoffs. Here are a few:

Raise Error

A simple solution could be to raise an error and then look at the report to analyze the error and backtrace, but this could result in problematic behavior. The object is already created, there might have been other side-effects, and we're suddenly stopping execution. Even if we were to use before_create instead, we might already have created other objects, and we could reach an unexpected state.

Log these instances

You could simply log these instances, in the same way you usually handle other Rails logs. One potential concern with this approach, however, is that you might not get alerted, and so need to be intentional about following up.

Alert without raising an error

You could use a custom approach to notify you and your team whenever such an occurrence arises. Do keep in mind that this could become very noisy if you missed an important spot.

Performance concerns and improvements

Performance cost

While we now have a solution (at runtime) to our initial problem, finding instances where we create an object from outside our Creator, let's take a look at the actual solution. One big concern when looking at this solution is that it might be costly. At each object creation, we suddenly have a few more costly steps we need to take, since we need to parse through the backtrace. There are ways to make this parsing leaner, but this approach still requires parsing the backtrace (or using an alternate backtrace, which in itself would make every operations more costly).

Potential improvements

There are a few things we could do to avoid this cost:

Set a global variable

One naive solution would be to simply set the value of a global variable from within the creator and to check whether it is set. However, there are a lot of things we need to take into consideration, such as:

  • Concurrency
  • Resetting the variable (What if it somehow fails to reset?)
  • Uniqueness of the variable

Create a new dedicated attribute on the model

Another solution would be to create a new attribute on our model dedicated to this purpose. While this would add irrelevant data to the model's table and "pollute" it, it's also not unheard of. It would be a more "efficient" way of handling this, but would remain inelegant. Moreover, some other engineer could simply not understand this pattern and blindly update that column outside of the Creator.

Testing Concerns

Often, people use factories purely for their tests/specs. If the factory itself calls upon the model directly, instead of the creator, this could be problematic: those calls would be detected as not coming from the creator and pass through your reporting process. A few thoughts on this matter:

  1. Since the whole code base should use the creator, you might consider it meaningless to create the object without the creator, making this a non-issue (although there could be some performance concerns).
  2. You could simply check the environment and skip the process if you're in the testing environment (although, beware of this in tests where you change the environment).
  3. Your reporting process might already be disabled in the testing environment (e.g.: a notification where the webhook is inactive) or insignificant to you and your team (e.g.: logs that don't raise errors).

Regardless, this remains a concern to consider.

Edge Case

Of course, things can always get complicated. One potential edge-case would be if the creator for "model A" makes a call that eventually, but unexpectedly, creates an object for "model A" without using the creator. Indeed, this would be detected as coming from our creator, despite the fact it's unintentional. However, if that's the case, there might be different concerns about how the creator pattern is used and the rise of circular dependencies.

Potential extension: Update

The next logical step would be to consider how we would handle this when centralizing the process of updating a model object, as opposed to creating a model object. We should be able to use the same initial solution we've highlighted with the backtrace, replacing after_create with after_update (or even after_save to combine both create and update). And considering how much harder it can be to find instances of a model's object being updated than being created (since we usually need to store the object in a variable), this seems all the more critical. However, we need to be careful with some of our options.

First, since it's harder to identify all locations where we update an object for a specific model, missed instances are likely to be more prevalent, meaning a lot of potential noise through notifications.

Next, not all solutions highlighted for create work as easily for update. For instance, one would need to be more careful with the "dedicated attribute" approach. Unlike creation (a one-time action), updates can happen many times throughout the life-cycle of an object, so we need to consider not just the current update, but also the next one. If we still wanted to take this approach, we do have a few options:

  • We could update the attribute back during the callback. However, this is not straightforward. This second update would be detected as not respecting the pattern, so you would need a special clause to test against this specific change. Moreover, this would mean updating the object twice for one set of changes.
  • This new column could be datetime, and we could relate it back to an updated_at column. However, one needs to be particularly careful about proper handling of different datetime attributes when they need to be correlated.

Final Thoughts

Here, I've outlined an outlandish way to test whether the Creator pattern is respected. Note that there might be other ways to test this out, and also better improvements than those suggested here. Moreover, it's quite likely that there already is extended literature on this, but I wanted to explore this concept and share my thoughts before researching it.

Feel free to share your thoughts and suggestions, and any references to existing solutions to this problem!

Ideal Solution

I would also like to highlight that, ideally, the kind of solution we'd want would not happen at runtime, but would instead analyze the code itself. In theory, this should be possible, but would likely involve significant effort and require intricate knowledge of Rails' internals. Moreover, it would likely be susceptible to changes in Rails.