How to Remove Duplicate Data in R

During the processing of data cleansing, it is often required to remove duplicate values from the database. A very useful application of subsetting data is to find and remove duplicate values. R has a useful function, duplicated(), that finds duplicate values and returns a logical vector that tells you whether the specified value is a duplicate of a previous value. This means that for duplicated values, duplicated() returns FALSE for the first occurrence and TRUE for every following occurrence of that value, as in the following example:

> duplicated(c(1,2,1,6,1,8))
[1] FALSE FALSE TRUE FALSE TRUE FALSE

If you try this on a data frame, R automatically checks the observations (meaning, it treats every row as a value). So, for example, with the data frame iris:

> duplicated(iris)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [10] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
....
 [136] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE

If you look carefully, you notice that row 143 is a duplicate (because the 143rd element of your result has the value TRUE). You also can tell this by using the which() function:

> which(duplicated(iris))
[1] 143

Now, to remove the duplicate from iris, you need to exclude this row from your data. Remember that there are two ways to exclude data using subsetting:

  • Specify a logical vector, where FALSE means that the element will be excluded. The ! (exclamation point) operator is a logical negation. This means that it converts TRUE into FALSE and vice versa. So, to remove the duplicates from iris, you do the following:

> iris[!duplicated(iris), ]

Specify negative values. In other words:

> index <- which(duplicated(iris))
> iris[-index, ]

In both cases, you’ll notice that your instruction has removed row 143.

How To Install R & R-Studio

R is a fundamental open source, case-sensitive programming language. RStudio is an active member of the R community and an integrated development environment (IDE)for R.

You need to install both R and R-Studio on your system before actually getting started with R. In this page, you will be guided through the installation process and get introduced to both of them.

Install R

Step 1: Download the package relevant to your system (Windows or Mac or Linux) from the Comprehensive R Archive Network (CRAN) website.

Step 2: Install R like you normally install any new software package.

Now, Install R-Studio

Step 1: Download the R-Studio Desktop package from the R-Studio website.

Step 2: Install R-Studio using user’s setup process.

Before you move on, make sure you have installed both R and R-Studio on your system. In this lecture, you will be introduced to different components of R-Studio.

How To Connect R-Studio to Adobe Analytics

With the help of Web Services API R-studio can be used to connect Adobe Analytics.

The Web Services APIs provide programmatic access to marketing reports and other Suite services that let you duplicate and augment functionality available through the Analytics interface.

Analytics > Admin > Company Settings > Web Services

RSiteCatalyst is a kind of good package of R.

SCAuth Store Credentials For The Adobe Analytics API

Usage:

SCAuth(key, secret, company = “”, token.file = “”, auth.method = “legacy”, debug.mode = FALSE, endpoint = “”, locale = “en_US”)

Arguments

key
Client id from your app in the Adobe Marketing cloud Dev Center OR if you are using auth.method=’legacy’, then this is the API username (username:company)

secret
Secret from your app in the Adobe Marketing cloud Dev Center OR if you are using auth.method=’legacy’, then this is the API shared secret

company
Your company (only required if using OAUTH2 AUTH method)

token.file
If you would like to save your OAUTH token and other auth details for use in future sessions, specify a file here. The method checks for the existence of the file and uses that if available.

auth.method
Defaults to legacy, can be set to ‘OAUTH2’ to use the newer OAUTH method.

debug.mode
Set global debug mode

endpoint
Set Adobe Analytics API endpoint rather than let RSiteCatalyst decide (not recommended)

locale
Set encoding for reports (defaults to en_US)

Details
Authorize and store credentials for the Adobe Analytics API

Value
Global credentials list ‘SC.Credentials’ in AdobeAnalytics (hidden) environment

References

The list of locale values can be obtained from the Adobe Analytics documentation:

https://marketing.adobe.com/developer/documentation/analytics-reporting-1-4/r-reportdescriptionlocale

After loading the library by library(RSiteCatalyst) command use the below command to authenticate.

SCAuth(“pheonixm.sptz:Tots”, “73ng567ygd93cf57e83ehgrteswefvaa”)

Note:It a sample for example purpose only,use your own correct user key and secret.

Sample Code :

library(“RSiteCatalyst”)

SCAuth(“Gaga:Ameri lines”,”2f40922f54f3ad116c2ca72317661b30″)

datefrom=Sys.Date()-91

dateto=Sys.Date()-1

trended_id<-QueueTrended(“productionreportsuite”,

datefrom,

dateto,

metrics = “visits”,

elements = c(“evar10″,”mobiledevicetype”),

segment.id = c(“s300000988_5be8bc302b36874afe33502f”,”s300000899_5bedfc3e706e9e558c910e1a”),

date.granularity = “day”,

interval.seconds = 300,

top = 30000

,expedite = TRUE

,enqueueOnly = TRUE

)

trended<-GetReport(trended_id, interval.seconds = 5, max.attempts = 5)

Note: RSiteCatalyst is a community-driven, open-source effort, not an official package from Adobe. As such, bug fixes and improvements rely on a vibrant community of users and developers contributing their free time and resources.

https://rdrr.io/cran/RSiteCatalyst/man/QueueDataWarehouse.html