Return to site

AEM - Copying Large Volume of Content

· AEM Move Content,AEM Copy Huge Volume,AEM Migrate Content,AEM Copy Content,AEM Asset Migration

Option A: Vault Remote Copy (http://jackrabbit.apache.org/filevault/rcp.html)

  • Step 1 (Vault Installation): Navigate to “crx-quickstart/opt/filevault” and unzip “filevault.zip”
  • Step 2 (Vault Verification): Navigate to bin directory, “{aem-install-folder}/crx-quickstart/opt/filevault/vault-cli-3.1.6/bin/”. Run the command “vlt –-version” and you should see output like this:

“Jackrabbit FileVault [version 3.1.6] Copyright 2013 by Apache Software Foundation. See LICENSE.txt for more information.”

Troubleshooting: If for some reason, you see any error related to Java not being recognized, please ensure JAVA_HOME is set. Steps to set JAVA_HOME (mac):

  • Edit your .bash_profile and add the lines below
    • export JAVA_HOME=$(/usr/libexec/java_home)
    • export PATH=$JAVA_HOME/jre/bin:$PATH
    • Save, Exit and Navigate back to path provided in Step 2 and run “vlt –-version” again
  • Step 3 (Disable Workflows): When copying large volumes of data, you can turn off renditions.  This can be done by editing “workflow models”. Access “Launcher tab” in workflow console (/libs/cq/workflow/content/console.html) and edit each workflow model that starts with “DAM” and set the ‘Activate’ radio button to “Disable” and Save.  See below:

Note: If you are looking at automating this step, you can run a curl command like this:

curl -u admin:admin -X POST http://localhost:4502/libs/cq/workflow/launcher?_charset_=utf-8&edit=%2Fetc%2Fworkflow%2Flauncher%2Fconfig%2Fupdate_asset_mod&%3Astatus=browser&eventType=16&nodetype=nt%3Afile&glob=%2Fcontent%2Fdam(%2F.*%2F)renditions%2Foriginal&condition=&workflow=%2Fetc%2Fworkflow%2Fmodels%2Fdam%2Fupdate_asset%2Fjcr%3Acontent%2Fmodel&description=&enabled=true&excludeList=&runModes=author

  • Step 4 (Copy): run the command below

./vlt rcp -b 100 -r -u -n {protocol}://username}:{password}@{aem-source-hostname}:{port}/crx/-/jcr:root/{path} {protocol}://{username}:{password}@{aem-target-hostname}:{port}/crx/-/jcr:root/{path}

 

A sample command to copy content from local cq-author to cq-publish

 

./vlt rcp -b 100 -r -u -n http://admin:admin@localhost:4502/crx/-/jcr:root/content/dam/geometrixx http://admin:admin@localhost:4503/crx/-/jcr:root/content/dam/geometrixx-test

  • Note:
    • Click here to access the details of flags used.
    • The example above uses a batch size of 100 and it is difficult to say what is the recommended batch size. I ran this on various environments including my local and observed different results depending on the batch size. For example, on my local I observed best results when the batch size was around 1000.

    Step 5: Re-enable the models disable before running the vault remote copy.

    • Note: If the run modes are strictly for author and you are copying content to publisher then do not worry about disabling and re-enabling them.

    Pros:

    • You can just your local machine to setup vault and no installation is required on AEM source server and AEM destination server (as long as you don’t have any firewall rules blocking access to the source and target servers).
    • Faster than package manager. Do not consume package space like package manager.

    Cons:

    • Enable/ Disable workflow models before and after running the sync
    • Slow I/O

    Option B: grabbit (https://github.com/TWCable/grabbit)

    Step 1: Download grabbit.

    Step 2: Install the AEM grabbit package and Fragment bundle on AEM Server (source AEM server for Content) and AEM client (target AEM server where content is being copied to). Make sure the right grabbit version is installed.

    • AEM Grabbit Package Versions
      • v7.x - AEM 6.1 and AEM 6.2
      • v5.x - AEM 6.1
      • v4.x - AEM 6.1
      • v3.x - CQ 5.6 and AEM 6.0
      • v2.x - CQ 5.6

    Step 3: Create a config.json file (sample available in the package downloaded in Step 1). The file is easy to understand and make a copy of the file and add your environment and content path details. Note: The config.json has the source AEM server details from where the content is being copied from.

    Step 4: Run “grabbit.sh”.

    Enter Client details (Server Name: port), user name and password. Next, provide the path to config file. Note: Absolute path of ‘.json’ or ‘.yaml’ is needed. Further steps allow you to monitor the job and if something is wrong, please check AEM server logs (server and client).

    Step 4: Run “grabbit.sh”.

    Enter Client details (Server Name: port), user name and password. Next, provide the path to config file. Note: Absolute path of ‘.json’ or ‘.yaml’ is needed. Further steps allow you to monitor the job and if something is wrong, please check AEM server logs (server and client).

    Step 5: Cleanup. This is very key to run after the job is completed. Overtime, the Grabbit job repository grows in size and no pruning happens on it. This could have impact on performance and we also observed the second run against the same path was not successful if the cleanup is not done. The best way to do the cleanup is delete the node “/var/grabbit/job/repository” or run a cleanup using the curl command (curl -u admin:admin -X POST http://locahost:4503/grabbit/jobrepository/clean?hours=x where ‘x’ is the number of hours)

    Pros:

    1. Faster than “Vault Remote Copy”. I am unable to provide stats because every run gave me different results. For example, copying 400MB took 3.2 minutes the first time and the second time (erased the data before copying) it took 4.1 minutes.
    2. Not required to enable/ disable workflows manually. (Note: there were couple of instances, where the workflows had to be re-enabled manually after the job was completed)

    Cons:

    1. I believe the ACL nodes are not copied (please double check this as my tests did not include "Access Control" nodes )
    2. Takes time to setup and troubleshooting might be difficult if you notice any errors
    3. Source and Target paths are same i.e. you cannot alter the destination path when copying.
    All Posts
    ×

    Almost done…

    We just sent you an email. Please click the link in the email to confirm your subscription!

    OKSubscriptions powered by Strikingly