Occasionally, we face the problem: along with the regular Mysql model, we also have a replica database.
This replica database seems to be kept in sync with database production, but it is not used by web servers or background workers, everything in it can only be used as read-only data.
Why do we care about it? In today’s large systems, there are often page dashboards, on which there are some features that work with large amounts of data such as: displaying statistics, information about the status of some modules in system … In these problems, the problem will be simple if we do not have to care about realtime statistics, then just query the data in the database and store the results in the cache.
Everything becomes complicated if we care about accurate data in real time, accurate to the second, then the problem of optimizing the query to return the data quickly and accurately. is the key task. That’s when you may need to replica the database, and learn the model of connecting it to Active Record in the Rails application.
Our original idea was to implement on each separate table we want to use when querying replica:
1 2 3 4 5 6 7 8 | class Account < ActiveRecord::Base end class Account class Replica < ActiveRecord::Base establish_connection "leadfeeder_replica_#{Rails.env}".to_sym end end |
In particular, config leadfeeder_replica _ # {env} is defined in the database.yml file for each environment we use (development, test, production …), this way we will have flexibility in the query:
1 2 3 4 5 6 7 8 9 | leadfeeder_replica_production: database: dummy_db username: user password: pass host: leadfeederreplica.aws.com port: 3306 leadfeeder_replica_development: <<: *development |
As you can see, there doesn’t seem to be a real DRY to define these modules in each model. Also we have to remember to call readonly! on replica logs to prevent writing to this database. This can be done automatically for each model by using the after_initialize callback in the active record:
1 2 3 4 5 6 7 | class Account class Replica < ActiveRecord::Base establish_connection "leadfeeder_replica_#{Rails.env}".to_sym after_initialize :readonly! end end |
But this has not yet solved the DRY issue on the models. The solution is to use ActiveSupport :: Concern in combination with the included callback:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | module LeadfeederReplica extend ActiveSupport::Concern included do |klass| database_config = "leadfeeder_read_#{Rails.env}".to_sym replica = (klass::Replica = Class.new(klass)) replica.establish_connection(database_config) replica.after_initialize(:readonly!) end end class Account < ActiveRecord::Base include LeadfeederReplica end ### Querying: Account::Replica.select(:id).first # Account::Replica Load (10.4ms) SELECT `accounts`.* FROM `accounts` ORDER BY `accounts`.`id` ASC LIMIT 1 => #<Account::Replica:0x007f9ec47d16e8 id: 1> ### Persistence: Account::Replica.first.save Account::Replica Load (3.2ms) SELECT `accounts`.* FROM `accounts` ORDER BY `accounts`.`id` ASC LIMIT 1 (3.7ms) BEGIN (1.7ms) ROLLBACK ActiveRecord::ReadOnlyRecord: Account::Replica is marked as readonly ### Bonus: [21] pry(main)> Account::Replica.new.is_a?(Account) => true |
Base use cases only
What happens when querying on relational models, on tables joined together:
1 2 3 4 5 6 7 8 9 10 11 | class Account < ActiveRecord::Base include LeadfeederReplica belongs_to :user end class User < ActiveRecord::Base include LeadfeederReplica has_many :accounts end User::Replica.includes(:account).first.account |
Unfortunately, in this case, it will not return Account :: Replica but only Account instance. It also queries directly into the database production when we use the lazy loading account.
Source:
https://medium.com/@konole/using-replica-database-with-your-activerecord-models-b2a8ce9ef46