Be The Better Rails Developer
Rails has been very prominent in web development the last 5-7 years, and for good reason. It is relatively easy to learn and its abstractions make it easy to build features quickly. An unfortunate side effect of this has been that Rails developers have learned web development through Rails. This means that their understanding ends where the Rails abstractions begin.
This shallow understanding is especially problematic when it comes to database and SQL. So, here are some basic concepts that, when properly understood, will leave you considerably more qualified than about 95% of Rails developers i interview.
0. Database level validation
Before we dive into the list, i should point out a high level detail. Rails gives us validators that provide a series of niceties in terms of form error reporting. While these can save time, they are not a replacement for database level validations. The reasons for this are both performance as well as data quality related. In truth, this topic requires a post unto itself. So i will move on for the sake of brevity.
For now, just understand that the following code is categorically inferior to the other methods detailed in this list. I will touch on a couple of reasons why as we go.
class User
# This is NOT a replacement for tips 1-7 below
validates :username, presence: true, unique: true
# ...
end
1. Default and Not Null clauses
Database columns have data types as we all know. Additionally, they have some basic settings regarding nullability and default values. If a field should never be blank, the database should enforce that. Flag, or boolean columns, for example, should rarely be nullable.
So we can define column validators directly on our database columns at virtually no performance cost. Additionally, default database values remove the need for code initializers we would otherwise need to maintain. Flag columns are another obvious application here. If a table has a deleted
column, we probably want to default that to false
.
create table 'users' do |t|
t.first_name, :string
t.username, :string, null: false
t.deleted, :boolean, null: false, default: false
end
2. Column indexing
Column indices can be thought of a lot like keys in a hash. Like a key, looking up a row by an indexed column provides O(1) access to the data. For large tables, or tables that are frequently loaded by checking values in 1 or 2 columns, adding a column index can provide impressive performance gains without updating a single query.
A rule i use is, any foreign key column should additionally have an index. This is important: when you tell Rails something like belongs_to :photo
, Rails does not update the database to index user.photo_id
. This way, each direction of our user and photo lookup is fast and easy.
def up
add_index( :users, :username )
add_index( :users, :photo_id )
end
3. Unique column constraints
Database indices can also be used to enforce value uniqueness on columns. This is notable not just because of the safety it provides, but also the performance enhancements it brings over the Rails validators.
Instead of relying on Rails to query the database to confirm uniqueness every time i save a user record, we will use an index to automatically achieve the same effect. Also, as with my foreign key rule above, if a column needs to be unique, it is probably used enough that it deserves its own index anyway.
def up
add_index( :users, :username, unique: true )
end
4. Foreign keys
Foreign keys provide a layer of data validation when tables join on one another. In the case of a foreign key, we are telling the database that a column explicitly points to another tables column. A benefit of this is the database can prevent us from having foreign key columns pointing to rows that no longer exist.
def up
add_foreign_key( :users, :photos, name: 'user_photo_fk' )
end
In this example, note that i am passing an optional name
parameter to the foreign key function. This is because the naming convention Rails uses for foreign keys is uniquely useless. Save yourself the future pain and take 15 seconds to define a foreign key naming convention in your codebase.
Database foreign keys are particularly important if you are a shop that deletes database rows. While i cringe at the thought, you are all but guaranteed to have columns pointing to table rows that no longer exist. These edge cases force you to either: code a bunch of ugly presence checks, or have bugs in your code. Let's skip both outcomes and just add a 1-line solution.
Now, if you have do orphaned rows, adding a foreign key will fail as the database doesn't dare deal with them itself. So, let's say we have Users pointing to Photos that do not exist anymore. No matter, we can deal with those.
def up
# First, clear photo_id's that point to missing photo records
User.joins( 'LEFT JOIN photos "photos" ON users.photo_id = photos.id' )
.where( photos: {id: nil} )
.update_all( photo_id: nil )
# Then we can safely add our foreign key
add_foreign_key( :users, :photos, name: 'user_photo_fk' )
end
This is a bit unusual as far as Rails-style queries go. By LEFT JOIN
ing photos, we are telling the database to return all users, regardless of if they have a photo or not. Then, by specifying where( photos: {id: nil} )
we effectively eliminate users who do have photos. The remaining users can all have their photo_id
safely set to NULL.
note: If your foreign keys are on child records, in a has_many relationship, you can duplicate the solution above and simply delete the orphans instead of updating the parents
5. Update statements
The most common code-level mistake i see is a Ruby loop doing in O(3n) queries what can otherwise be accomplished in a single, sub-second UDPATE
statement.
While these fixes are very much ad-hoc, think of it like this: if you are writing a loop that does nothing but iterate over models and updates them after some programmatic manipulation, it is probably doable in an UPDATE
statement. Let's look at an example.
Pretend we are Tinder and want a basic activity_score
saved to each user. This might be a function of weighted swipe counts along with how recently they have been active. We might write it like this:
users = User.all
users.each do |user|
score = 0.0
score += 1 * user.left_swipes.count()
score += 2 * user.right_swipes.count()
score /= (Date.today - u.last_active.to_date).to_i + 1
user.update_attributes( activity_score: score )
end
While this calculates values fine, we are doing a remarkable amount of database work. We load every attribute of every user into memory, despite only referencing its id
(for joins) and last_active
. The joins are themselves queries. The update_attributes()
is also a query. Given 10,000 users, these 6 lines actually execute 30,001 queries. We can do better with an UPDATE
statement.
update_sql = <<-end_query
UPDATE users user
SET activity_score =
( 1 * COUNT(left_swipes.id) + 2 * COUNT(right_swipes.id) ) /
( (NOW()::date - user.last_active::date) + 1 )
FROM
swipes AS left_swipes,
swipes AS right_swipes
WHERE
left_swipes.user_id = user.id AND
left_swipes.direction = 'left' AND
right_swipes.user_id = user.id AND
right_swipes.direction = 'right'
end_query
User.connection.execute( update_sql )
1 query. Now, this can also be executed with Rails-style functions. However, as query logic gets more complex, i find it easier to understand in its pure SQL form. Either way, this method stands tall over a loop. We save ourselves 30,000 queries and 10,000 object instantiations to boot.
6. Multiple inserts/updates
Occasionally, you will run into CSV imports that necessitate multiple inserts or complex updates where the above methods will not help performance. In these situations, you will find things can take minutes to hours depending on the size of a migration. As with the previous example, our best bet in these positions is to reduce the raw number of database interactions.
So how do we reduce it? The following looks about as simple as we can get
csv = CSV.open( './users.csv' )
csv.each_line do |user_row|
User.create( ... )
end
But it turns out it isn't. Each create()
call will not only be building a Rails User
object, which takes time, but will also do its own save()
, which takes more time. If you also made the mistake of using validators on your User
model, expect an automated uniqueness query before each save, wasting more time still.
Instead, we can bypass all of this by building our own raw SQL statements and joining them for execution:
inserts = []
csv = CSV.open( './users.csv' )
csv.each_line do |user_row|
# Compile a list of INSERT statements...
inserts << "INSERT INTO users ( ... ) VALUES ( ... )"
if inserts.length > 100
# and insert them in batches (batch size is frankly arbitrary)
User.connection.execute( inserts.join(';') )
inserts = []
end
end
# Insert any remaining from the final batch
User.connection.execute( inserts.join(';') )
While this looks a bit uglier, the time savings we get from skipping all the work mentioned above can cause this method to run 10+ times faster than a more traditional loop with create()
statements.
Keep learning
While the most glaring issue in Rails developers is database understanding, there are course others. If you have worked in only a single framework, Rails or otherwise, i urge you to try your hand at others. Different frameworks abstract different things and swapping will highlight blindspots in your understanding.
Also, many frameworks take entirely different philosophical approaches to the web development problem. This perspective shift is excellent for expanding your web development knowledge. If you are new to web development and are still leaning your way in the Rails world, remember to stay curious. Try Sinatra, a lightweight Ruby framework that exposes a lot of pieces that Rails abstracts from you.
My grandfather says that "learning is the discovery of our own ignorance". For Rails developers, the abstractions hide complexities almost too well. You simply can't know what they haven't seen. Consider this a starting point.