All most all corporations move data from one database to another or from one location to another for various purposes. There are various products like GoldenGate, Data Guard, Attunity, Tungsten and others for replicating data. I often hear one product is better than other and I don't want to deny that but what matters in the data replication is the design of the database and the replication strategies.
These are my replication best practices which I learnt the hard way so I and my colleagues can sleep without getting a single page for latencies, locked tables or other replication errors. I don't make any exception to these rules for any customers or clients because this is going to bite us and become a nightmare to support it.
Limit the number of tables to as low as possible. Don't replicate the whole schema or database using a single replication task. Divide them into many jobs and balance big and small tables in terms of volume and data size.
Don't replicate CLOB or LOB. Many replication tools either don't support or truncate CLOB/LOB column even though they advertise it.
1-1 table replication between the source and target table. The replication always should be the 1-1 table from source and target and it can't be from many tables in the source to one table in the target. Some tools support MERGE logic but this is a terrible idea when data volumes grow.
Limit the number of Constraints in Target tables. PK is the only constraints that you need to avoid duplicate. Adding too many Constraints slow down the replication and those constraints check should happen in source and not target table.
Absolutely No DDL Replication just DML Replication. Yes, most tool support this feature but I would disable these features to prevent the target table from being ALTERed, TRUNCATEd, and DROPped.
The table structure should be the same in both source and target. The source and target table should have the same table structures including data types and the only thing that can be different is the constraint on target tables.
Not your ETL or Data Manipulation Tool. Remember, this is your replication tool, not your ETL tool. Even though they allow some kind of data manipulation, we should limit the data manipulation for data conversion from one database to another. The will and can slow down the replication.
Don't Advertise real-time replication. There are no tools for real-time replication and never promise anyone regardless of data volumes. With the best database design and replication best practices, you can achieve close to near-real-time but there are so many other factors like network, database load, bandwidth which can increase latencies which you don't have control.
No Replication within the same database. Yes, I have seen a replication happening from one schema to another schema within the same Oracle database. Similarly, in one database to another database within a SQL Server Instance. Don't set up this type of replication even though replication is supported within a database, use script or other database vendor provided tools. It causes a lot of latency and generates redo log that isn't necessarily causing overhead to the database.
No Merge/UPSERT logic. This is similar to that of rule #3, repeating again to reinforce the practice
Never enable Bi-Directional replication feature. This is the most dangerous replication practices of all and I would tell my client there is no such feature out there. Anyone has enabled bi-directional replication? Please tell us more on the project and how do you support when there is data mismatch issue? I would like to learn more and everyone here would like to know more.
Violating these Eleven Golden Replication rules may suffer into a loss of sleep, latency anxiety, replication PTSD and much, much more. I hope you use them at your own risk to benefit your health, the database health and make no exception to these rules. Oops, I just got a latency alert, I will see you after the latency is caught up.