How to aggregate categorical data in R? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!How to sum a variable by groupQuickly reading very large tables as dataframesGrouping functions (tapply, by, aggregate) and the *apply familyShow % instead of counts in charts of categorical variablesDrop data frame columns by nameHow to make a great R reproducible exampleHow to assign colors to categorical variables in ggplot2 that have stable mapping?data.table vs dplyr: can one do something well the other can't or does poorly?Aggregating mixed data by factor columnWhy does pandas grouping-aggregation discard categoricals column?
Dinosaur Word Search, Letter Solve, and Unscramble
What helicopter has the most rotor blades?
New Order #6: Easter Egg
RM anova or Factorial Anova?
Question on Gÿongy' lemma proof
Who's this lady in the war room?
Why did Israel vote against lifting the American embargo on Cuba?
Lemmatization Vs Stemming
Alternative of "Rest In Peace" (RIP)
How to achieve cat-like agility?
geoserver.catalog.FailedRequestError: Tried to make a GET request to http://localhost:8080/geoserver/workspaces.xml but got a 404 status code
The Nth Gryphon Number
Combining list in a Cartesian product format with addition operation?
"Destructive power" carried by a B-52?
Is the Mordenkainen's Sword spell underpowered?
Restricting the Object Type for the get method in java HashMap
Random body shuffle every night—can we still function?
Can anyone explain what's the meaning of this in the new Game of Thrones opening animations?
My mentor says to set image to Fine instead of RAW — how is this different from JPG?
Can two people see the same photon?
Is there a canonical “inverse” of Abelianization?
why doesn't university give past final exams' answers
Weaponising the Grasp-at-a-Distance spell
IC on Digikey is 5x more expensive than board containing same IC on Alibaba: How?
How to aggregate categorical data in R?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!How to sum a variable by groupQuickly reading very large tables as dataframesGrouping functions (tapply, by, aggregate) and the *apply familyShow % instead of counts in charts of categorical variablesDrop data frame columns by nameHow to make a great R reproducible exampleHow to assign colors to categorical variables in ggplot2 that have stable mapping?data.table vs dplyr: can one do something well the other can't or does poorly?Aggregating mixed data by factor columnWhy does pandas grouping-aggregation discard categoricals column?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:
Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar
I would like to come up with a table like this:
Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
How would you go about it?
r aggregate
add a comment |
I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:
Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar
I would like to come up with a table like this:
Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
How would you go about it?
r aggregate
5
Looks like you needtable(df1)
– akrun
Apr 2 at 16:27
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
Apr 2 at 16:29
I would convert tofactor
with commonlevels
lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls)
and then do thetable(df1)
– akrun
Apr 2 at 16:43
add a comment |
I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:
Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar
I would like to come up with a table like this:
Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
How would you go about it?
r aggregate
I have a dataframe which consists of two columns with categorical variables (Better, Similar, Worse). I would like to come up with a table which counts the number of times that these categories appear in the two columns.
The dataframe I am using is as follows:
Category.x Category.y
1 Better Better
2 Better Better
3 Similar Similar
4 Worse Similar
I would like to come up with a table like this:
Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
How would you go about it?
r aggregate
r aggregate
asked Apr 2 at 16:26
DanielDaniel
665
665
5
Looks like you needtable(df1)
– akrun
Apr 2 at 16:27
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
Apr 2 at 16:29
I would convert tofactor
with commonlevels
lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls)
and then do thetable(df1)
– akrun
Apr 2 at 16:43
add a comment |
5
Looks like you needtable(df1)
– akrun
Apr 2 at 16:27
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
Apr 2 at 16:29
I would convert tofactor
with commonlevels
lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls)
and then do thetable(df1)
– akrun
Apr 2 at 16:43
5
5
Looks like you need
table(df1)
– akrun
Apr 2 at 16:27
Looks like you need
table(df1)
– akrun
Apr 2 at 16:27
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
Apr 2 at 16:29
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
Apr 2 at 16:29
I would convert to
factor
with common levels
lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls)
and then do the table(df1)
– akrun
Apr 2 at 16:43
I would convert to
factor
with common levels
lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls)
and then do the table(df1)
– akrun
Apr 2 at 16:43
add a comment |
3 Answers
3
active
oldest
votes
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
add a comment |
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
add a comment |
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
Apr 2 at 16:56
Please see the updated post for commentary.
– tmfmnk
Apr 2 at 17:04
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
add a comment |
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
add a comment |
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
As mentioned in the comments, table
is standard for this, like
table(stack(DT))
ind
values Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
table(value = unlist(DT), cat = names(DT)[col(DT)])
cat
value Category.x Category.y
Better 2 2
Similar 1 2
Worse 1 0
or
with(reshape(DT, direction = "long", varying = 1:2),
table(value = Category, cat = time)
)
cat
value x y
Better 2 2
Similar 1 2
Worse 1 0
answered Apr 2 at 16:48
FrankFrank
56.3k661137
56.3k661137
add a comment |
add a comment |
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
add a comment |
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
add a comment |
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
sapply(df1, function(x) sapply(unique(unlist(df1)), function(y) sum(y == x)))
# Category.x Category.y
#Better 2 2
#Similar 1 2
#Worse 1 0
answered Apr 2 at 16:33
d.bd.b
20.5k41949
20.5k41949
add a comment |
add a comment |
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
Apr 2 at 16:56
Please see the updated post for commentary.
– tmfmnk
Apr 2 at 17:04
add a comment |
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
Apr 2 at 16:56
Please see the updated post for commentary.
– tmfmnk
Apr 2 at 17:04
add a comment |
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val) %>%
count(var, val) %>%
spread(var, n, fill = 0)
val Category.x Category.y
<chr> <dbl> <dbl>
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
It, first, transforms the data from wide to long format, with column "var" including the variable names and column "val" the corresponding values. Second, it counts per "var" and "val". Finally, it spreads the data into the desired format.
Or with dplyr
and reshape2
you can do:
df %>%
mutate(rowid = row_number()) %>%
melt(., id.vars = "rowid") %>%
count(variable, value) %>%
dcast(value ~ variable, value.var = "n", fill = 0)
value Category.x Category.y
1 Better 2 2
2 Similar 1 2
3 Worse 1 0
edited Apr 2 at 17:58
answered Apr 2 at 16:41
tmfmnktmfmnk
4,2061516
4,2061516
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
Apr 2 at 16:56
Please see the updated post for commentary.
– tmfmnk
Apr 2 at 17:04
add a comment |
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
Apr 2 at 16:56
Please see the updated post for commentary.
– tmfmnk
Apr 2 at 17:04
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
Apr 2 at 16:56
Is var = Category.x and val= c('Better', 'Similar', 'Worse')?
– Daniel
Apr 2 at 16:56
Please see the updated post for commentary.
– tmfmnk
Apr 2 at 17:04
Please see the updated post for commentary.
– tmfmnk
Apr 2 at 17:04
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55479506%2fhow-to-aggregate-categorical-data-in-r%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
Looks like you need
table(df1)
– akrun
Apr 2 at 16:27
Is it possible to reformat the table, so that I get it as a 3x2 table instead of a 3x3?
– Daniel
Apr 2 at 16:29
I would convert to
factor
with commonlevels
lvls <- unique(unlist(df1)); df1[] <- lapply(df1, factor, levels = lvls)
and then do thetable(df1)
– akrun
Apr 2 at 16:43