How to subset dataframe based on a “not equal to” criteria applied to a large number of columns? The Next CEO of Stack OverflowHow to sort a dataframe by multiple column(s)?Extract a subset of a dataframe based on a condition involving a fieldHow to change the order of DataFrame columns?How to apply a function to two columns of Pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNSelect rows from a DataFrame based on values in a column in pandasHow to convert index of a pandas dataframe into a column?How to count the NaN values in a column in pandas DataFramesubset a dataframe based on sum of a columnSubset dataframe based on number of observations in each column

Interfacing a button to MCU (and PC) with 50m long cable

How to avoid supervisors with prejudiced views?

How do I transpose the 1st and -1th levels of an arbitrarily nested array?

Is "for causing autism in X" grammatical?

Calculus II Question

Is it ever safe to open a suspicious html file (e.g. email attachment)?

What exact does MIB represent in SNMP? How is it different from OID?

Are there any limitations on attacking while grappling?

Why do remote companies require working in the US?

Would a galaxy be visible from outside, but nearby?

Why does standard notation not preserve intervals (visually)

Multiple labels for a single equation

What is the result of assigning to std::vector<T>::begin()?

Unreliable Magic - Is it worth it?

Can we say or write : "No, it'sn't"?

Complex fractions

How do I go from 300 unfinished/half written blog posts, to published posts?

Is it professional to write unrelated content in an almost-empty email?

Novel about a guy who is possessed by the divine essence and the world ends?

Make solar eclipses exceedingly rare, but still have new moons

How do I reset passwords on multiple websites easily?

How do I make a variable always equal to the result of some calculations?

Can I equip Skullclamp on a creature I am sacrificing?

I believe this to be a fraud - hired, then asked to cash check and send cash as Bitcoin

How to subset dataframe based on a “not equal to” criteria applied to a large number of columns?

The Next CEO of Stack OverflowHow to sort a dataframe by multiple column(s)?Extract a subset of a dataframe based on a condition involving a fieldHow to change the order of DataFrame columns?How to apply a function to two columns of Pandas dataframeHow to drop rows of Pandas DataFrame whose value in certain columns is NaNSelect rows from a DataFrame based on values in a column in pandasHow to convert index of a pandas dataframe into a column?How to count the NaN values in a column in pandas DataFramesubset a dataframe based on sum of a columnSubset dataframe based on number of observations in each column

I'm new to R and currently trying to subset my data according to my predefined exclusion criteria for analysis. I'm presently trying to remove all cases that have dementia, as coded by the ICD-10. Problem is that there are multiple variables containing information on each individual's disease status (~70 variables), although as they are coded in the same way, the same condition can be applied to all of them.

Some simulated data:

#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
 disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
 disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
 disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))

#data is structured as below:

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352

Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.

#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"| 
 "G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
 "F012"| "F011"| "F010"|"F01"))

The error that I recieve is:

Error in 2:4 != "F023" | "G20" : 
 operations are possible only for numeric, logical or complex types

Ideally, the subsetted dataframe would look like this:

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.

Any help is greatly appreciated!

edited 15 hours ago

Sotos

31.1k51741

asked 15 hours ago

M_Oxford

433

New contributor

1

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
15 hours ago

add a comment |

Some simulated data:

#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
 disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
 disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
 disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))

#data is structured as below:

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352

Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.

#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"| 
 "G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
 "F012"| "F011"| "F010"|"F01"))

The error that I recieve is:

Error in 2:4 != "F023" | "G20" : 
 operations are possible only for numeric, logical or complex types

Ideally, the subsetted dataframe would look like this:

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.

Any help is greatly appreciated!

edited 15 hours ago

Sotos

31.1k51741

asked 15 hours ago

M_Oxford

433

New contributor

1

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
15 hours ago

add a comment |

Some simulated data:

#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
 disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
 disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
 disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))

#data is structured as below:

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352

Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.

#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"| 
 "G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
 "F012"| "F011"| "F010"|"F01"))

The error that I recieve is:

Error in 2:4 != "F023" | "G20" : 
 operations are possible only for numeric, logical or complex types

Ideally, the subsetted dataframe would look like this:

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.

Any help is greatly appreciated!

edited 15 hours ago

Sotos

31.1k51741

asked 15 hours ago

M_Oxford

433

New contributor

Some simulated data:

#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
 disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
 disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
 disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))

#data is structured as below:

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352

Here, I'm trying to remove any case that has a 'dementia code' across any of the "disease_code" variables.

#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"| 
 "G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
 "F012"| "F011"| "F010"|"F01"))

The error that I recieve is:

Error in 2:4 != "F023" | "G20" : 
 operations are possible only for numeric, logical or complex types

Ideally, the subsetted dataframe would look like this:

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

I know that there is an error in my code although I'm not sure how exactly to fix it. I've tried a few other ways (using dplyr) although haven't had any luck so far.

Any help is greatly appreciated!

r dataframe filter subset

edited 15 hours ago

Sotos

31.1k51741

asked 15 hours ago

M_Oxford

433

New contributor

edited 15 hours ago

Sotos

31.1k51741

asked 15 hours ago

M_Oxford

433

New contributor

edited 15 hours ago

Sotos

31.1k51741

edited 15 hours ago

Sotos

31.1k51741

edited 15 hours ago

Sotos

31.1k51741

asked 15 hours ago

M_Oxford

433

New contributor

asked 15 hours ago

M_Oxford

433

asked 15 hours ago

M_Oxford

433

New contributor

M_Oxford is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
15 hours ago

add a comment |

1

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
15 hours ago

You should reshape your data to long format. That will make your life (and analysis) much easier.

– docendo discimus
15 hours ago

add a comment |

6 Answers
6

active

oldest

votes

One dplyr possibility could be:

df %>%
 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%
 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited 14 hours ago

answered 15 hours ago

tmfmnk

3,5591516

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
13 hours ago

add a comment |

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
 "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

answered 15 hours ago

Sotos

31.1k51741

add a comment |

As mentioned in comments by @docendo discimus we can convert the dataframe to long format using gather, group_by ID and select only those IDs which do not have dementia_code in them and then spread them back to wide format.

library(tidyverse)

df %>%
 gather(key, value, -ID) %>%
 group_by(ID) %>%
 filter(!any(value %in% dementia_code)) %>%
 spread(key, value)

# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr> 
#1 1001 I802 A071 H250 
#2 1002 H356 NA NA 
#3 1004 D235 NA I802 
#4 1005 B178 NA NA 
#5 1008 C761 NA NA 
#6 1011 J679 A045 D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited 14 hours ago

answered 15 hours ago

Ronak Shah

43.8k104266

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
14 hours ago

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
14 hours ago

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
14 hours ago

add a comment |

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
> 
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
> 
> df[!dementia,]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

Hope it helps.

edited 14 hours ago

answered 15 hours ago

Santiago Capobianco

491310

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
14 hours ago

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
14 hours ago

add a comment |

We can use melt/dcast from data.table

library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
 if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
 # ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
 "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 
 "F012", "F011", "F010", "F01")

edited 13 hours ago

answered 13 hours ago

akrun

417k13206279

add a comment |

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
 disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
 disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
 disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

new_df <- df[0,]

for(i in 1:nrow(df))
 currRow <- df[i,]
 if(any(dementia_codes %in% as.character(currRow)) == FALSE)
 new_df <- rbind(new_df, currRow)
 


new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352

edited 14 hours ago

answered 14 hours ago

Dunois

858

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55417645%2fhow-to-subset-dataframe-based-on-a-not-equal-to-criteria-applied-to-a-large-nu%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

One dplyr possibility could be:

df %>%
 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%
 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited 14 hours ago

answered 15 hours ago

tmfmnk

3,5591516

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
13 hours ago

add a comment |

One dplyr possibility could be:

df %>%
 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%
 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited 14 hours ago

answered 15 hours ago

tmfmnk

3,5591516

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
13 hours ago

add a comment |

One dplyr possibility could be:

df %>%
 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%
 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited 14 hours ago

answered 15 hours ago

tmfmnk

3,5591516

One dplyr possibility could be:

df %>%
 filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352

In this case, it checks whether any of the columns 2:4 contains any of the given codes.

Or:

df %>%
 filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00", 
 "G309", "G308","G301","G300","G30", "F01","F018","F013",
 "F012", "F011", "F010","F01")))

In this case, it checks whether any of the columns with names disease_code contains any of the given codes.

edited 14 hours ago

answered 15 hours ago

tmfmnk

3,5591516

edited 14 hours ago

answered 15 hours ago

tmfmnk

3,5591516

answered 15 hours ago

tmfmnk

3,5591516

answered 15 hours ago

tmfmnk

3,5591516

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
13 hours ago

add a comment |

1

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
13 hours ago

Thanks everyone for your suggestions! I appreciate that you also explained what your suggested code does @tmfmnk - really useful!

– M_Oxford
13 hours ago

add a comment |

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
 "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

answered 15 hours ago

Sotos

31.1k51741

add a comment |

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
 "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

answered 15 hours ago

Sotos

31.1k51741

add a comment |

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
 "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

answered 15 hours ago

Sotos

31.1k51741

We can create a vector with the codes to be removed and use rowSums to remove, i.e.

codes_to_remove <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
 "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

df[rowSums(sapply(df[-1], `%in%`, codes_to_remove)) == 0,]

which gives,

 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

answered 15 hours ago

Sotos

31.1k51741

answered 15 hours ago

Sotos

31.1k51741

answered 15 hours ago

Sotos

31.1k51741

answered 15 hours ago

Sotos

31.1k51741

add a comment |

library(tidyverse)

df %>%
 gather(key, value, -ID) %>%
 group_by(ID) %>%
 filter(!any(value %in% dementia_code)) %>%
 spread(key, value)

# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr> 
#1 1001 I802 A071 H250 
#2 1002 H356 NA NA 
#3 1004 D235 NA I802 
#4 1005 B178 NA NA 
#5 1008 C761 NA NA 
#6 1011 J679 A045 D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited 14 hours ago

answered 15 hours ago

Ronak Shah

43.8k104266

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
14 hours ago

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
14 hours ago

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
14 hours ago

add a comment |

library(tidyverse)

df %>%
 gather(key, value, -ID) %>%
 group_by(ID) %>%
 filter(!any(value %in% dementia_code)) %>%
 spread(key, value)

# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr> 
#1 1001 I802 A071 H250 
#2 1002 H356 NA NA 
#3 1004 D235 NA I802 
#4 1005 B178 NA NA 
#5 1008 C761 NA NA 
#6 1011 J679 A045 D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited 14 hours ago

answered 15 hours ago

Ronak Shah

43.8k104266

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
14 hours ago

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
14 hours ago

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
14 hours ago

add a comment |

library(tidyverse)

df %>%
 gather(key, value, -ID) %>%
 group_by(ID) %>%
 filter(!any(value %in% dementia_code)) %>%
 spread(key, value)

# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr> 
#1 1001 I802 A071 H250 
#2 1002 H356 NA NA 
#3 1004 D235 NA I802 
#4 1005 B178 NA NA 
#5 1008 C761 NA NA 
#6 1011 J679 A045 D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited 14 hours ago

answered 15 hours ago

Ronak Shah

43.8k104266

library(tidyverse)

df %>%
 gather(key, value, -ID) %>%
 group_by(ID) %>%
 filter(!any(value %in% dementia_code)) %>%
 spread(key, value)

# ID disease_code_1 disease_code_2 disease_code_3
# <dbl> <chr> <chr> <chr> 
#1 1001 I802 A071 H250 
#2 1002 H356 NA NA 
#3 1004 D235 NA I802 
#4 1005 B178 NA NA 
#5 1008 C761 NA NA 
#6 1011 J679 A045 D352

data

dementia_code <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", 
"G308","G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

edited 14 hours ago

answered 15 hours ago

Ronak Shah

43.8k104266

edited 14 hours ago

answered 15 hours ago

Ronak Shah

43.8k104266

answered 15 hours ago

Ronak Shah

43.8k104266

answered 15 hours ago

Ronak Shah

43.8k104266

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
14 hours ago

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
14 hours ago

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
14 hours ago

add a comment |

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
14 hours ago

1

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
14 hours ago

3

We could also do it using an anti_join such as Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
14 hours ago

Why load all of tidyverse? Isn't this just tidyr and dplyr?

– Dunois
14 hours ago

@Dunois yes, it is. I have a habit of loading it all up by default :P

– Ronak Shah
14 hours ago

We could also do it using an anti_join such as

Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
14 hours ago

We could also do it using an anti_join such as

Newdata_df <- df %>% anti_join(df %>% gather(DiseaseCodeNumber, CodeValue, -ID) %>% filter(CodeValue %in% c("F023","G20","F009","F002","F001","F000","F00", "G309", "G308","G301","G300","G30","F01","F018","F013", "F012", "F011","F010","F01")), by = "ID")

– Kerry Jackson
14 hours ago

add a comment |

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
> 
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
> 
> df[!dementia,]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

Hope it helps.

edited 14 hours ago

answered 15 hours ago

Santiago Capobianco

491310

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
14 hours ago

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
14 hours ago

add a comment |

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
> 
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
> 
> df[!dementia,]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

Hope it helps.

edited 14 hours ago

answered 15 hours ago

Santiago Capobianco

491310

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
14 hours ago

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
14 hours ago

add a comment |

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
> 
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
> 
> df[!dementia,]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

Hope it helps.

edited 14 hours ago

answered 15 hours ago

Santiago Capobianco

491310

How about this:

> dementia <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308",
+ "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")
> 
> dementia <- apply(sapply(df[, -1], function(x) x %in% dementia), 1, any)
> 
> df[!dementia,]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
>

Edit:

An even more elegant solution, thanks to @ Ronan Shah:

> df[apply(df[-1], 1, function(x) !any(x %in% dementia)),]
 ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352

Hope it helps.

edited 14 hours ago

answered 15 hours ago

Santiago Capobianco

491310

edited 14 hours ago

answered 15 hours ago

Santiago Capobianco

491310

answered 15 hours ago

Santiago Capobianco

491310

answered 15 hours ago

Santiago Capobianco

491310

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
14 hours ago

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
14 hours ago

add a comment |

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
14 hours ago

1

Yes! Sorry, I will change it right away.

– Santiago Capobianco
14 hours ago

@ Ronan Shah Nice! Its a more elegant solution. You should post it.

– Santiago Capobianco
14 hours ago

Yes! Sorry, I will change it right away.

– Santiago Capobianco
14 hours ago

add a comment |

We can use melt/dcast from data.table

library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
 if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
 # ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
 "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 
 "F012", "F011", "F010", "F01")

edited 13 hours ago

answered 13 hours ago

akrun

417k13206279

add a comment |

We can use melt/dcast from data.table

library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
 if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
 # ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
 "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 
 "F012", "F011", "F010", "F01")

edited 13 hours ago

answered 13 hours ago

akrun

417k13206279

add a comment |

We can use melt/dcast from data.table

library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
 if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
 # ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
 "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 
 "F012", "F011", "F010", "F01")

edited 13 hours ago

answered 13 hours ago

akrun

417k13206279

We can use melt/dcast from data.table

library(data.table)
dcast(melt(setDT(df), id.var = 'ID')[,
 if(!any(value %in% dementia_codes)) .SD, .(ID)], ID ~ variable)
# ID disease_code_1 disease_code_2 disease_code_3
#1: 1001 I802 A071 H250
#2: 1002 H356 NA NA
#3: 1004 D235 NA I802
#4: 1005 B178 NA NA
#5: 1008 C761 NA NA
#6: 1011 J679 A045 D352

Or this can be done more compactly in base R with no reshaping

df[!Reduce(`|`, lapply(df[-1], `%in%` , dementia_codes)),]
 # ID disease_code_1 disease_code_2 disease_code_3
#1 1001 I802 A071 H250
#2 1002 H356 NA NA
#4 1004 D235 NA I802
#5 1005 B178 NA NA
#8 1008 C761 NA NA
#11 1011 J679 A045 D352

data

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", 
 "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", 
 "F012", "F011", "F010", "F01")

edited 13 hours ago

answered 13 hours ago

akrun

417k13206279

edited 13 hours ago

answered 13 hours ago

akrun

417k13206279

answered 13 hours ago

akrun

417k13206279

answered 13 hours ago

akrun

417k13206279

add a comment |

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
 disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
 disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
 disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

new_df <- df[0,]

for(i in 1:nrow(df))
 currRow <- df[i,]
 if(any(dementia_codes %in% as.character(currRow)) == FALSE)
 new_df <- rbind(new_df, currRow)
 


new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352

edited 14 hours ago

answered 14 hours ago

Dunois

858

add a comment |

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
 disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
 disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
 disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

new_df <- df[0,]

for(i in 1:nrow(df))
 currRow <- df[i,]
 if(any(dementia_codes %in% as.character(currRow)) == FALSE)
 new_df <- rbind(new_df, currRow)
 


new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352

edited 14 hours ago

answered 14 hours ago

Dunois

858

add a comment |

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
 disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
 disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
 disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

new_df <- df[0,]

for(i in 1:nrow(df))
 currRow <- df[i,]
 if(any(dementia_codes %in% as.character(currRow)) == FALSE)
 new_df <- rbind(new_df, currRow)
 


new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352

edited 14 hours ago

answered 14 hours ago

Dunois

858

A for loop version with base R, in case you prefer that.

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
 disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
 disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
 disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'), stringsAsFactors = FALSE)

dementia_codes <- c("F023", "G20", "F009", "F002", "F001", "F000", "F00", "G309", "G308", "G301", "G300", "G30", "F01", "F018", "F013", "F012", "F011", "F010", "F01")

new_df <- df[0,]

for(i in 1:nrow(df))
 currRow <- df[i,]
 if(any(dementia_codes %in% as.character(currRow)) == FALSE)
 new_df <- rbind(new_df, currRow)
 


new_df
# ID disease_code_1 disease_code_2 disease_code_3
# 1 1001 I802 A071 H250
# 2 1002 H356 NA NA
# 4 1004 D235 NA I802
# 5 1005 B178 NA NA
# 8 1008 C761 NA NA
# 11 1011 J679 A045 D352

edited 14 hours ago

answered 14 hours ago

Dunois

858

edited 14 hours ago

answered 14 hours ago

Dunois

858

answered 14 hours ago

Dunois

858

answered 14 hours ago

Dunois

858

add a comment |

M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

M_Oxford is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gdddk

6 Answers
6

data

Your Answer

Post as a guest

6 Answers
6

6 Answers
6

data

data

data

data

Post as a guest

Popular posts from this blog

Tórshavn Kliima | Partnerstääden | Luke uk diar | Nawigatsjuun62° 1′ N, 6° 46′ W62° 1′ 0″ N, 6° 46′ 0″ WWMOTórshavn

南部首創開放式體驗廚房用智慧廚具做出好料理

6 Answers 6

data

Your Answer

Sign up or log in

Post as a guest

Post as a guest

6 Answers 6

6 Answers 6

data

data

data

data

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tórshavn Kliima | Partnerstääden | Luke uk diar | Nawigatsjuun62° 1′ N, 6° 46′ W62° 1′ 0″ N, 6° 46′ 0″ WWMOTórshavn

南部首創開放式體驗廚房 用智慧廚具做出好料理

6 Answers
6

6 Answers
6

6 Answers
6

南部首創開放式體驗廚房用智慧廚具做出好料理