transform histogram to violin plot in r with ggplot












2















I am currently trying to learn r with the help of Hadley Wickham's great resources ("r for data scientists", "ggplot2 Elegant Graphics for Data Analysis"). So far I was able to find answers to all my problems there (thank you so much, Hadley!), but not this time.



Currently, I am working with data from an instrument that estimates particle size by the light the particles scatter (DLS, Zetasizer Nano, Malvern Instruments). The data extracted from this device are some summary statistics (e.g. mean particle size) and histogram data: x = size (split in bins), y = intensity [%].
Here is a tibble of one of my measurements:



   # A tibble: 70 x 3
sample_name intensities bins
<chr> <dbl> <dbl>
1 core formulation 1 0 0.4
2 core formulation 1 0 0.463
3 core formulation 1 0 0.536
4 core formulation 1 0 0.621
5 core formulation 1 0 0.720
6 core formulation 1 0 0.833
7 core formulation 1 0 0.965
8 core formulation 1 0 1.12
9 core formulation 1 0 1.29
10 core formulation 1 0 1.50
11 core formulation 1 0 1.74
12 core formulation 1 0 2.01
13 core formulation 1 0 2.33
14 core formulation 1 0 2.70
15 core formulation 1 0 3.12
16 core formulation 1 0 3.62
17 core formulation 1 0 4.19
18 core formulation 1 0 4.85
19 core formulation 1 0 5.62
20 core formulation 1 0 6.50
21 core formulation 1 0 7.53
22 core formulation 1 0 8.72
23 core formulation 1 0 10.1
24 core formulation 1 0 11.7
25 core formulation 1 0 13.5
26 core formulation 1 0 15.7
27 core formulation 1 0 18.2
28 core formulation 1 0 21.0
29 core formulation 1 0 24.4
30 core formulation 1 0 28.2
31 core formulation 1 0 32.7
32 core formulation 1 0 37.8
33 core formulation 1 0 43.8
34 core formulation 1 0.2 50.8
35 core formulation 1 1.4 58.8
36 core formulation 1 3.7 68.1
37 core formulation 1 6.9 78.8
38 core formulation 1 10.2 91.3
39 core formulation 1 12.9 106.
40 core formulation 1 14.4 122.
41 core formulation 1 14.4 142.
42 core formulation 1 13 164.
43 core formulation 1 10.3 190.
44 core formulation 1 7.1 220.
45 core formulation 1 3.9 255
46 core formulation 1 1.5 295.
47 core formulation 1 0.2 342
48 core formulation 1 0 396.
49 core formulation 1 0 459.
50 core formulation 1 0 531.
51 core formulation 1 0 615.
52 core formulation 1 0 712.
53 core formulation 1 0 825
54 core formulation 1 0 955.
55 core formulation 1 0 1106
56 core formulation 1 0 1281
57 core formulation 1 0 1484
58 core formulation 1 0 1718
59 core formulation 1 0 1990
60 core formulation 1 0 2305
61 core formulation 1 0 2669
62 core formulation 1 0 3091
63 core formulation 1 0 3580
64 core formulation 1 0 4145
65 core formulation 1 0 4801
66 core formulation 1 0 5560
67 core formulation 1 0 6439
68 core formulation 1 0 7456
69 core formulation 1 0 8635
70 core formulation 1 0 10000


Here is the data produced with the dput() command:



structure(list(sample_name = c("core formulation 1", "core formulation 1", 
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1"), intensities = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 1.4, 3.7, 6.9, 10.2, 12.9,
14.4, 14.4, 13, 10.3, 7.1, 3.9, 1.5, 0.2, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), bins = c(0.4,
0.4632, 0.5365, 0.6213, 0.7195, 0.8332, 0.9649, 1.117, 1.294,
1.499, 1.736, 2.01, 2.328, 2.696, 3.122, 3.615, 4.187, 4.849,
5.615, 6.503, 7.531, 8.721, 10.1, 11.7, 13.54, 15.69, 18.17,
21.04, 24.36, 28.21, 32.67, 37.84, 43.82, 50.75, 58.77, 68.06,
78.82, 91.28, 105.7, 122.4, 141.8, 164.2, 190.1, 220.2, 255,
295.3, 342, 396.1, 458.7, 531.2, 615.1, 712.4, 825, 955.4, 1106,
1281, 1484, 1718, 1990, 2305, 2669, 3091, 3580, 4145, 4801, 5560,
6439, 7456, 8635, 10000)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -70L))


I can produce a histogram with no problems from this data:



library(tidyverse)
ggplot (DLS_intensities_core, aes(bins,intensities) ) +
geom_line() +
scale_x_continuous(trans = 'log10')


line plot, intensities against bins



In order to show the overall distribution of my particle size, I would like to convert this data into a violin plot and use the summary statistics provided by the device in a second layer of my plot.



Therefore, I would like to transform this data to be able to create a violin plot from it.



I have already tried feeding it to the stat_density () argument of the violin plot but so far with no success.



Do you know how to create a violin plot from this data?



Thank you very much!



Best,



Dominik










share|improve this question




















  • 1





    Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

    – hrbrmstr
    Nov 25 '18 at 16:33











  • I am sorry, I will add it immediately! thank you for pointing it out!

    – Dome
    Nov 25 '18 at 16:39






  • 1





    Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

    – hrbrmstr
    Nov 25 '18 at 17:30













  • Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

    – Dome
    Nov 25 '18 at 17:43


















2















I am currently trying to learn r with the help of Hadley Wickham's great resources ("r for data scientists", "ggplot2 Elegant Graphics for Data Analysis"). So far I was able to find answers to all my problems there (thank you so much, Hadley!), but not this time.



Currently, I am working with data from an instrument that estimates particle size by the light the particles scatter (DLS, Zetasizer Nano, Malvern Instruments). The data extracted from this device are some summary statistics (e.g. mean particle size) and histogram data: x = size (split in bins), y = intensity [%].
Here is a tibble of one of my measurements:



   # A tibble: 70 x 3
sample_name intensities bins
<chr> <dbl> <dbl>
1 core formulation 1 0 0.4
2 core formulation 1 0 0.463
3 core formulation 1 0 0.536
4 core formulation 1 0 0.621
5 core formulation 1 0 0.720
6 core formulation 1 0 0.833
7 core formulation 1 0 0.965
8 core formulation 1 0 1.12
9 core formulation 1 0 1.29
10 core formulation 1 0 1.50
11 core formulation 1 0 1.74
12 core formulation 1 0 2.01
13 core formulation 1 0 2.33
14 core formulation 1 0 2.70
15 core formulation 1 0 3.12
16 core formulation 1 0 3.62
17 core formulation 1 0 4.19
18 core formulation 1 0 4.85
19 core formulation 1 0 5.62
20 core formulation 1 0 6.50
21 core formulation 1 0 7.53
22 core formulation 1 0 8.72
23 core formulation 1 0 10.1
24 core formulation 1 0 11.7
25 core formulation 1 0 13.5
26 core formulation 1 0 15.7
27 core formulation 1 0 18.2
28 core formulation 1 0 21.0
29 core formulation 1 0 24.4
30 core formulation 1 0 28.2
31 core formulation 1 0 32.7
32 core formulation 1 0 37.8
33 core formulation 1 0 43.8
34 core formulation 1 0.2 50.8
35 core formulation 1 1.4 58.8
36 core formulation 1 3.7 68.1
37 core formulation 1 6.9 78.8
38 core formulation 1 10.2 91.3
39 core formulation 1 12.9 106.
40 core formulation 1 14.4 122.
41 core formulation 1 14.4 142.
42 core formulation 1 13 164.
43 core formulation 1 10.3 190.
44 core formulation 1 7.1 220.
45 core formulation 1 3.9 255
46 core formulation 1 1.5 295.
47 core formulation 1 0.2 342
48 core formulation 1 0 396.
49 core formulation 1 0 459.
50 core formulation 1 0 531.
51 core formulation 1 0 615.
52 core formulation 1 0 712.
53 core formulation 1 0 825
54 core formulation 1 0 955.
55 core formulation 1 0 1106
56 core formulation 1 0 1281
57 core formulation 1 0 1484
58 core formulation 1 0 1718
59 core formulation 1 0 1990
60 core formulation 1 0 2305
61 core formulation 1 0 2669
62 core formulation 1 0 3091
63 core formulation 1 0 3580
64 core formulation 1 0 4145
65 core formulation 1 0 4801
66 core formulation 1 0 5560
67 core formulation 1 0 6439
68 core formulation 1 0 7456
69 core formulation 1 0 8635
70 core formulation 1 0 10000


Here is the data produced with the dput() command:



structure(list(sample_name = c("core formulation 1", "core formulation 1", 
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1"), intensities = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 1.4, 3.7, 6.9, 10.2, 12.9,
14.4, 14.4, 13, 10.3, 7.1, 3.9, 1.5, 0.2, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), bins = c(0.4,
0.4632, 0.5365, 0.6213, 0.7195, 0.8332, 0.9649, 1.117, 1.294,
1.499, 1.736, 2.01, 2.328, 2.696, 3.122, 3.615, 4.187, 4.849,
5.615, 6.503, 7.531, 8.721, 10.1, 11.7, 13.54, 15.69, 18.17,
21.04, 24.36, 28.21, 32.67, 37.84, 43.82, 50.75, 58.77, 68.06,
78.82, 91.28, 105.7, 122.4, 141.8, 164.2, 190.1, 220.2, 255,
295.3, 342, 396.1, 458.7, 531.2, 615.1, 712.4, 825, 955.4, 1106,
1281, 1484, 1718, 1990, 2305, 2669, 3091, 3580, 4145, 4801, 5560,
6439, 7456, 8635, 10000)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -70L))


I can produce a histogram with no problems from this data:



library(tidyverse)
ggplot (DLS_intensities_core, aes(bins,intensities) ) +
geom_line() +
scale_x_continuous(trans = 'log10')


line plot, intensities against bins



In order to show the overall distribution of my particle size, I would like to convert this data into a violin plot and use the summary statistics provided by the device in a second layer of my plot.



Therefore, I would like to transform this data to be able to create a violin plot from it.



I have already tried feeding it to the stat_density () argument of the violin plot but so far with no success.



Do you know how to create a violin plot from this data?



Thank you very much!



Best,



Dominik










share|improve this question




















  • 1





    Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

    – hrbrmstr
    Nov 25 '18 at 16:33











  • I am sorry, I will add it immediately! thank you for pointing it out!

    – Dome
    Nov 25 '18 at 16:39






  • 1





    Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

    – hrbrmstr
    Nov 25 '18 at 17:30













  • Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

    – Dome
    Nov 25 '18 at 17:43
















2












2








2








I am currently trying to learn r with the help of Hadley Wickham's great resources ("r for data scientists", "ggplot2 Elegant Graphics for Data Analysis"). So far I was able to find answers to all my problems there (thank you so much, Hadley!), but not this time.



Currently, I am working with data from an instrument that estimates particle size by the light the particles scatter (DLS, Zetasizer Nano, Malvern Instruments). The data extracted from this device are some summary statistics (e.g. mean particle size) and histogram data: x = size (split in bins), y = intensity [%].
Here is a tibble of one of my measurements:



   # A tibble: 70 x 3
sample_name intensities bins
<chr> <dbl> <dbl>
1 core formulation 1 0 0.4
2 core formulation 1 0 0.463
3 core formulation 1 0 0.536
4 core formulation 1 0 0.621
5 core formulation 1 0 0.720
6 core formulation 1 0 0.833
7 core formulation 1 0 0.965
8 core formulation 1 0 1.12
9 core formulation 1 0 1.29
10 core formulation 1 0 1.50
11 core formulation 1 0 1.74
12 core formulation 1 0 2.01
13 core formulation 1 0 2.33
14 core formulation 1 0 2.70
15 core formulation 1 0 3.12
16 core formulation 1 0 3.62
17 core formulation 1 0 4.19
18 core formulation 1 0 4.85
19 core formulation 1 0 5.62
20 core formulation 1 0 6.50
21 core formulation 1 0 7.53
22 core formulation 1 0 8.72
23 core formulation 1 0 10.1
24 core formulation 1 0 11.7
25 core formulation 1 0 13.5
26 core formulation 1 0 15.7
27 core formulation 1 0 18.2
28 core formulation 1 0 21.0
29 core formulation 1 0 24.4
30 core formulation 1 0 28.2
31 core formulation 1 0 32.7
32 core formulation 1 0 37.8
33 core formulation 1 0 43.8
34 core formulation 1 0.2 50.8
35 core formulation 1 1.4 58.8
36 core formulation 1 3.7 68.1
37 core formulation 1 6.9 78.8
38 core formulation 1 10.2 91.3
39 core formulation 1 12.9 106.
40 core formulation 1 14.4 122.
41 core formulation 1 14.4 142.
42 core formulation 1 13 164.
43 core formulation 1 10.3 190.
44 core formulation 1 7.1 220.
45 core formulation 1 3.9 255
46 core formulation 1 1.5 295.
47 core formulation 1 0.2 342
48 core formulation 1 0 396.
49 core formulation 1 0 459.
50 core formulation 1 0 531.
51 core formulation 1 0 615.
52 core formulation 1 0 712.
53 core formulation 1 0 825
54 core formulation 1 0 955.
55 core formulation 1 0 1106
56 core formulation 1 0 1281
57 core formulation 1 0 1484
58 core formulation 1 0 1718
59 core formulation 1 0 1990
60 core formulation 1 0 2305
61 core formulation 1 0 2669
62 core formulation 1 0 3091
63 core formulation 1 0 3580
64 core formulation 1 0 4145
65 core formulation 1 0 4801
66 core formulation 1 0 5560
67 core formulation 1 0 6439
68 core formulation 1 0 7456
69 core formulation 1 0 8635
70 core formulation 1 0 10000


Here is the data produced with the dput() command:



structure(list(sample_name = c("core formulation 1", "core formulation 1", 
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1"), intensities = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 1.4, 3.7, 6.9, 10.2, 12.9,
14.4, 14.4, 13, 10.3, 7.1, 3.9, 1.5, 0.2, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), bins = c(0.4,
0.4632, 0.5365, 0.6213, 0.7195, 0.8332, 0.9649, 1.117, 1.294,
1.499, 1.736, 2.01, 2.328, 2.696, 3.122, 3.615, 4.187, 4.849,
5.615, 6.503, 7.531, 8.721, 10.1, 11.7, 13.54, 15.69, 18.17,
21.04, 24.36, 28.21, 32.67, 37.84, 43.82, 50.75, 58.77, 68.06,
78.82, 91.28, 105.7, 122.4, 141.8, 164.2, 190.1, 220.2, 255,
295.3, 342, 396.1, 458.7, 531.2, 615.1, 712.4, 825, 955.4, 1106,
1281, 1484, 1718, 1990, 2305, 2669, 3091, 3580, 4145, 4801, 5560,
6439, 7456, 8635, 10000)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -70L))


I can produce a histogram with no problems from this data:



library(tidyverse)
ggplot (DLS_intensities_core, aes(bins,intensities) ) +
geom_line() +
scale_x_continuous(trans = 'log10')


line plot, intensities against bins



In order to show the overall distribution of my particle size, I would like to convert this data into a violin plot and use the summary statistics provided by the device in a second layer of my plot.



Therefore, I would like to transform this data to be able to create a violin plot from it.



I have already tried feeding it to the stat_density () argument of the violin plot but so far with no success.



Do you know how to create a violin plot from this data?



Thank you very much!



Best,



Dominik










share|improve this question
















I am currently trying to learn r with the help of Hadley Wickham's great resources ("r for data scientists", "ggplot2 Elegant Graphics for Data Analysis"). So far I was able to find answers to all my problems there (thank you so much, Hadley!), but not this time.



Currently, I am working with data from an instrument that estimates particle size by the light the particles scatter (DLS, Zetasizer Nano, Malvern Instruments). The data extracted from this device are some summary statistics (e.g. mean particle size) and histogram data: x = size (split in bins), y = intensity [%].
Here is a tibble of one of my measurements:



   # A tibble: 70 x 3
sample_name intensities bins
<chr> <dbl> <dbl>
1 core formulation 1 0 0.4
2 core formulation 1 0 0.463
3 core formulation 1 0 0.536
4 core formulation 1 0 0.621
5 core formulation 1 0 0.720
6 core formulation 1 0 0.833
7 core formulation 1 0 0.965
8 core formulation 1 0 1.12
9 core formulation 1 0 1.29
10 core formulation 1 0 1.50
11 core formulation 1 0 1.74
12 core formulation 1 0 2.01
13 core formulation 1 0 2.33
14 core formulation 1 0 2.70
15 core formulation 1 0 3.12
16 core formulation 1 0 3.62
17 core formulation 1 0 4.19
18 core formulation 1 0 4.85
19 core formulation 1 0 5.62
20 core formulation 1 0 6.50
21 core formulation 1 0 7.53
22 core formulation 1 0 8.72
23 core formulation 1 0 10.1
24 core formulation 1 0 11.7
25 core formulation 1 0 13.5
26 core formulation 1 0 15.7
27 core formulation 1 0 18.2
28 core formulation 1 0 21.0
29 core formulation 1 0 24.4
30 core formulation 1 0 28.2
31 core formulation 1 0 32.7
32 core formulation 1 0 37.8
33 core formulation 1 0 43.8
34 core formulation 1 0.2 50.8
35 core formulation 1 1.4 58.8
36 core formulation 1 3.7 68.1
37 core formulation 1 6.9 78.8
38 core formulation 1 10.2 91.3
39 core formulation 1 12.9 106.
40 core formulation 1 14.4 122.
41 core formulation 1 14.4 142.
42 core formulation 1 13 164.
43 core formulation 1 10.3 190.
44 core formulation 1 7.1 220.
45 core formulation 1 3.9 255
46 core formulation 1 1.5 295.
47 core formulation 1 0.2 342
48 core formulation 1 0 396.
49 core formulation 1 0 459.
50 core formulation 1 0 531.
51 core formulation 1 0 615.
52 core formulation 1 0 712.
53 core formulation 1 0 825
54 core formulation 1 0 955.
55 core formulation 1 0 1106
56 core formulation 1 0 1281
57 core formulation 1 0 1484
58 core formulation 1 0 1718
59 core formulation 1 0 1990
60 core formulation 1 0 2305
61 core formulation 1 0 2669
62 core formulation 1 0 3091
63 core formulation 1 0 3580
64 core formulation 1 0 4145
65 core formulation 1 0 4801
66 core formulation 1 0 5560
67 core formulation 1 0 6439
68 core formulation 1 0 7456
69 core formulation 1 0 8635
70 core formulation 1 0 10000


Here is the data produced with the dput() command:



structure(list(sample_name = c("core formulation 1", "core formulation 1", 
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1", "core formulation 1",
"core formulation 1", "core formulation 1"), intensities = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 1.4, 3.7, 6.9, 10.2, 12.9,
14.4, 14.4, 13, 10.3, 7.1, 3.9, 1.5, 0.2, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), bins = c(0.4,
0.4632, 0.5365, 0.6213, 0.7195, 0.8332, 0.9649, 1.117, 1.294,
1.499, 1.736, 2.01, 2.328, 2.696, 3.122, 3.615, 4.187, 4.849,
5.615, 6.503, 7.531, 8.721, 10.1, 11.7, 13.54, 15.69, 18.17,
21.04, 24.36, 28.21, 32.67, 37.84, 43.82, 50.75, 58.77, 68.06,
78.82, 91.28, 105.7, 122.4, 141.8, 164.2, 190.1, 220.2, 255,
295.3, 342, 396.1, 458.7, 531.2, 615.1, 712.4, 825, 955.4, 1106,
1281, 1484, 1718, 1990, 2305, 2669, 3091, 3580, 4145, 4801, 5560,
6439, 7456, 8635, 10000)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -70L))


I can produce a histogram with no problems from this data:



library(tidyverse)
ggplot (DLS_intensities_core, aes(bins,intensities) ) +
geom_line() +
scale_x_continuous(trans = 'log10')


line plot, intensities against bins



In order to show the overall distribution of my particle size, I would like to convert this data into a violin plot and use the summary statistics provided by the device in a second layer of my plot.



Therefore, I would like to transform this data to be able to create a violin plot from it.



I have already tried feeding it to the stat_density () argument of the violin plot but so far with no success.



Do you know how to create a violin plot from this data?



Thank you very much!



Best,



Dominik







r ggplot2 histogram transformation violin-plot






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 '18 at 17:01







Dome

















asked Nov 25 '18 at 16:30









DomeDome

115




115








  • 1





    Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

    – hrbrmstr
    Nov 25 '18 at 16:33











  • I am sorry, I will add it immediately! thank you for pointing it out!

    – Dome
    Nov 25 '18 at 16:39






  • 1





    Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

    – hrbrmstr
    Nov 25 '18 at 17:30













  • Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

    – Dome
    Nov 25 '18 at 17:43
















  • 1





    Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

    – hrbrmstr
    Nov 25 '18 at 16:33











  • I am sorry, I will add it immediately! thank you for pointing it out!

    – Dome
    Nov 25 '18 at 16:39






  • 1





    Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

    – hrbrmstr
    Nov 25 '18 at 17:30













  • Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

    – Dome
    Nov 25 '18 at 17:43










1




1





Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

– hrbrmstr
Nov 25 '18 at 16:33





Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

– hrbrmstr
Nov 25 '18 at 16:33













I am sorry, I will add it immediately! thank you for pointing it out!

– Dome
Nov 25 '18 at 16:39





I am sorry, I will add it immediately! thank you for pointing it out!

– Dome
Nov 25 '18 at 16:39




1




1





Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

– hrbrmstr
Nov 25 '18 at 17:30







Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

– hrbrmstr
Nov 25 '18 at 17:30















Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

– Dome
Nov 25 '18 at 17:43







Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

– Dome
Nov 25 '18 at 17:43














2 Answers
2






active

oldest

votes


















1














I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:



library(hrbrthemes)

gather(DLS_intensities_core, measure, value, -sample_name) %>%
ggplot(aes(measure, value)) +
geom_violin(scale = "count") +
scale_y_comma() +
facet_wrap(~measure, scales="free") +
labs(
x = NULL, y = "A better label than this",
title = "A better title than this",
caption = "NOTE: Free Y scales"
) +
theme_ipsum_rc(grid="Y") +
theme(axis.text.x = element_blank())


enter image description here



I generally like to layer in the points as well:



gather(DLS_intensities_core, measure, value, -sample_name) %>% 
ggplot(aes(measure, value)) +
geom_violin(scale = "count") +
ggbeeswarm::geom_quasirandom() +
scale_y_comma() +
facet_wrap(~measure, scales="free") +
labs(
x = NULL, y = "A better label than this",
title = "A better title than this",
caption = "NOTE: Free Y scales"
) +
theme_ipsum_rc(grid="Y") +
theme(axis.text.x = element_blank())


enter image description here



Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:



library(hrbrthemes)
library(tidyverse)

ggplot(DLS_intensities_core, aes(x="", bins)) +
geom_violin(scale = "count") +
ggbeeswarm::geom_quasirandom(
aes(size = intensities, fill = intensities), shape = 21
) +
scale_y_comma(trans="log10") +
viridis::scale_fill_viridis(direction = -1, trans = "log1p") +
scale_size_continuous(trans = "log1p", range = c(2, 10)) +
guides(fill = guide_legend()) +
labs(
x = NULL, y = "A better label than this",
title = "A better title than this"
) +
theme_ipsum_rc(grid="Y")


enter image description here



You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).






share|improve this answer


























  • thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

    – Dome
    Nov 25 '18 at 18:11











  • See if the latest update does a bit more of what you want.

    – hrbrmstr
    Nov 25 '18 at 18:23











  • yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

    – Dome
    Nov 26 '18 at 9:18











  • I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

    – Dome
    Nov 26 '18 at 9:18



















0














I found a solution to my problem, it is probably not very elegant:



library (tidyverse)

DLS_intensities_core <- DLS_intensities_core %>%
mutate(counts = intensities * 10 )

vectors <- DLS_intensities_core %>%
filter(counts > 0)

bins_v <- vectors$bins
count_v <- vectors$counts

violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))
violin_DLSdata$sample_name <- "core formulation 1"

ggplot (violin_DLSdata, aes(sample_name, value)) +
geom_violin() +
labs(
x = NULL, y = "size"
) +
scale_y_continuous(trans = 'log10', limits = c(1, 1000))


violin_plot 1
for my whole dataset it looks like this:
violin_plot with summary statistics
I have added: summary statistic as red dots with errorbars.



What do you think?






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53469540%2ftransform-histogram-to-violin-plot-in-r-with-ggplot%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:



    library(hrbrthemes)

    gather(DLS_intensities_core, measure, value, -sample_name) %>%
    ggplot(aes(measure, value)) +
    geom_violin(scale = "count") +
    scale_y_comma() +
    facet_wrap(~measure, scales="free") +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
    ) +
    theme_ipsum_rc(grid="Y") +
    theme(axis.text.x = element_blank())


    enter image description here



    I generally like to layer in the points as well:



    gather(DLS_intensities_core, measure, value, -sample_name) %>% 
    ggplot(aes(measure, value)) +
    geom_violin(scale = "count") +
    ggbeeswarm::geom_quasirandom() +
    scale_y_comma() +
    facet_wrap(~measure, scales="free") +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
    ) +
    theme_ipsum_rc(grid="Y") +
    theme(axis.text.x = element_blank())


    enter image description here



    Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:



    library(hrbrthemes)
    library(tidyverse)

    ggplot(DLS_intensities_core, aes(x="", bins)) +
    geom_violin(scale = "count") +
    ggbeeswarm::geom_quasirandom(
    aes(size = intensities, fill = intensities), shape = 21
    ) +
    scale_y_comma(trans="log10") +
    viridis::scale_fill_viridis(direction = -1, trans = "log1p") +
    scale_size_continuous(trans = "log1p", range = c(2, 10)) +
    guides(fill = guide_legend()) +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this"
    ) +
    theme_ipsum_rc(grid="Y")


    enter image description here



    You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).






    share|improve this answer


























    • thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

      – Dome
      Nov 25 '18 at 18:11











    • See if the latest update does a bit more of what you want.

      – hrbrmstr
      Nov 25 '18 at 18:23











    • yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

      – Dome
      Nov 26 '18 at 9:18











    • I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

      – Dome
      Nov 26 '18 at 9:18
















    1














    I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:



    library(hrbrthemes)

    gather(DLS_intensities_core, measure, value, -sample_name) %>%
    ggplot(aes(measure, value)) +
    geom_violin(scale = "count") +
    scale_y_comma() +
    facet_wrap(~measure, scales="free") +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
    ) +
    theme_ipsum_rc(grid="Y") +
    theme(axis.text.x = element_blank())


    enter image description here



    I generally like to layer in the points as well:



    gather(DLS_intensities_core, measure, value, -sample_name) %>% 
    ggplot(aes(measure, value)) +
    geom_violin(scale = "count") +
    ggbeeswarm::geom_quasirandom() +
    scale_y_comma() +
    facet_wrap(~measure, scales="free") +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
    ) +
    theme_ipsum_rc(grid="Y") +
    theme(axis.text.x = element_blank())


    enter image description here



    Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:



    library(hrbrthemes)
    library(tidyverse)

    ggplot(DLS_intensities_core, aes(x="", bins)) +
    geom_violin(scale = "count") +
    ggbeeswarm::geom_quasirandom(
    aes(size = intensities, fill = intensities), shape = 21
    ) +
    scale_y_comma(trans="log10") +
    viridis::scale_fill_viridis(direction = -1, trans = "log1p") +
    scale_size_continuous(trans = "log1p", range = c(2, 10)) +
    guides(fill = guide_legend()) +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this"
    ) +
    theme_ipsum_rc(grid="Y")


    enter image description here



    You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).






    share|improve this answer


























    • thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

      – Dome
      Nov 25 '18 at 18:11











    • See if the latest update does a bit more of what you want.

      – hrbrmstr
      Nov 25 '18 at 18:23











    • yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

      – Dome
      Nov 26 '18 at 9:18











    • I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

      – Dome
      Nov 26 '18 at 9:18














    1












    1








    1







    I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:



    library(hrbrthemes)

    gather(DLS_intensities_core, measure, value, -sample_name) %>%
    ggplot(aes(measure, value)) +
    geom_violin(scale = "count") +
    scale_y_comma() +
    facet_wrap(~measure, scales="free") +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
    ) +
    theme_ipsum_rc(grid="Y") +
    theme(axis.text.x = element_blank())


    enter image description here



    I generally like to layer in the points as well:



    gather(DLS_intensities_core, measure, value, -sample_name) %>% 
    ggplot(aes(measure, value)) +
    geom_violin(scale = "count") +
    ggbeeswarm::geom_quasirandom() +
    scale_y_comma() +
    facet_wrap(~measure, scales="free") +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
    ) +
    theme_ipsum_rc(grid="Y") +
    theme(axis.text.x = element_blank())


    enter image description here



    Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:



    library(hrbrthemes)
    library(tidyverse)

    ggplot(DLS_intensities_core, aes(x="", bins)) +
    geom_violin(scale = "count") +
    ggbeeswarm::geom_quasirandom(
    aes(size = intensities, fill = intensities), shape = 21
    ) +
    scale_y_comma(trans="log10") +
    viridis::scale_fill_viridis(direction = -1, trans = "log1p") +
    scale_size_continuous(trans = "log1p", range = c(2, 10)) +
    guides(fill = guide_legend()) +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this"
    ) +
    theme_ipsum_rc(grid="Y")


    enter image description here



    You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).






    share|improve this answer















    I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:



    library(hrbrthemes)

    gather(DLS_intensities_core, measure, value, -sample_name) %>%
    ggplot(aes(measure, value)) +
    geom_violin(scale = "count") +
    scale_y_comma() +
    facet_wrap(~measure, scales="free") +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
    ) +
    theme_ipsum_rc(grid="Y") +
    theme(axis.text.x = element_blank())


    enter image description here



    I generally like to layer in the points as well:



    gather(DLS_intensities_core, measure, value, -sample_name) %>% 
    ggplot(aes(measure, value)) +
    geom_violin(scale = "count") +
    ggbeeswarm::geom_quasirandom() +
    scale_y_comma() +
    facet_wrap(~measure, scales="free") +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this",
    caption = "NOTE: Free Y scales"
    ) +
    theme_ipsum_rc(grid="Y") +
    theme(axis.text.x = element_blank())


    enter image description here



    Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:



    library(hrbrthemes)
    library(tidyverse)

    ggplot(DLS_intensities_core, aes(x="", bins)) +
    geom_violin(scale = "count") +
    ggbeeswarm::geom_quasirandom(
    aes(size = intensities, fill = intensities), shape = 21
    ) +
    scale_y_comma(trans="log10") +
    viridis::scale_fill_viridis(direction = -1, trans = "log1p") +
    scale_size_continuous(trans = "log1p", range = c(2, 10)) +
    guides(fill = guide_legend()) +
    labs(
    x = NULL, y = "A better label than this",
    title = "A better title than this"
    ) +
    theme_ipsum_rc(grid="Y")


    enter image description here



    You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 25 '18 at 18:22

























    answered Nov 25 '18 at 17:36









    hrbrmstrhrbrmstr

    61.3k691152




    61.3k691152













    • thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

      – Dome
      Nov 25 '18 at 18:11











    • See if the latest update does a bit more of what you want.

      – hrbrmstr
      Nov 25 '18 at 18:23











    • yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

      – Dome
      Nov 26 '18 at 9:18











    • I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

      – Dome
      Nov 26 '18 at 9:18



















    • thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

      – Dome
      Nov 25 '18 at 18:11











    • See if the latest update does a bit more of what you want.

      – hrbrmstr
      Nov 25 '18 at 18:23











    • yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

      – Dome
      Nov 26 '18 at 9:18











    • I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

      – Dome
      Nov 26 '18 at 9:18

















    thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

    – Dome
    Nov 25 '18 at 18:11





    thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

    – Dome
    Nov 25 '18 at 18:11













    See if the latest update does a bit more of what you want.

    – hrbrmstr
    Nov 25 '18 at 18:23





    See if the latest update does a bit more of what you want.

    – hrbrmstr
    Nov 25 '18 at 18:23













    yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

    – Dome
    Nov 26 '18 at 9:18





    yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

    – Dome
    Nov 26 '18 at 9:18













    I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

    – Dome
    Nov 26 '18 at 9:18





    I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

    – Dome
    Nov 26 '18 at 9:18













    0














    I found a solution to my problem, it is probably not very elegant:



    library (tidyverse)

    DLS_intensities_core <- DLS_intensities_core %>%
    mutate(counts = intensities * 10 )

    vectors <- DLS_intensities_core %>%
    filter(counts > 0)

    bins_v <- vectors$bins
    count_v <- vectors$counts

    violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))
    violin_DLSdata$sample_name <- "core formulation 1"

    ggplot (violin_DLSdata, aes(sample_name, value)) +
    geom_violin() +
    labs(
    x = NULL, y = "size"
    ) +
    scale_y_continuous(trans = 'log10', limits = c(1, 1000))


    violin_plot 1
    for my whole dataset it looks like this:
    violin_plot with summary statistics
    I have added: summary statistic as red dots with errorbars.



    What do you think?






    share|improve this answer






























      0














      I found a solution to my problem, it is probably not very elegant:



      library (tidyverse)

      DLS_intensities_core <- DLS_intensities_core %>%
      mutate(counts = intensities * 10 )

      vectors <- DLS_intensities_core %>%
      filter(counts > 0)

      bins_v <- vectors$bins
      count_v <- vectors$counts

      violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))
      violin_DLSdata$sample_name <- "core formulation 1"

      ggplot (violin_DLSdata, aes(sample_name, value)) +
      geom_violin() +
      labs(
      x = NULL, y = "size"
      ) +
      scale_y_continuous(trans = 'log10', limits = c(1, 1000))


      violin_plot 1
      for my whole dataset it looks like this:
      violin_plot with summary statistics
      I have added: summary statistic as red dots with errorbars.



      What do you think?






      share|improve this answer




























        0












        0








        0







        I found a solution to my problem, it is probably not very elegant:



        library (tidyverse)

        DLS_intensities_core <- DLS_intensities_core %>%
        mutate(counts = intensities * 10 )

        vectors <- DLS_intensities_core %>%
        filter(counts > 0)

        bins_v <- vectors$bins
        count_v <- vectors$counts

        violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))
        violin_DLSdata$sample_name <- "core formulation 1"

        ggplot (violin_DLSdata, aes(sample_name, value)) +
        geom_violin() +
        labs(
        x = NULL, y = "size"
        ) +
        scale_y_continuous(trans = 'log10', limits = c(1, 1000))


        violin_plot 1
        for my whole dataset it looks like this:
        violin_plot with summary statistics
        I have added: summary statistic as red dots with errorbars.



        What do you think?






        share|improve this answer















        I found a solution to my problem, it is probably not very elegant:



        library (tidyverse)

        DLS_intensities_core <- DLS_intensities_core %>%
        mutate(counts = intensities * 10 )

        vectors <- DLS_intensities_core %>%
        filter(counts > 0)

        bins_v <- vectors$bins
        count_v <- vectors$counts

        violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))
        violin_DLSdata$sample_name <- "core formulation 1"

        ggplot (violin_DLSdata, aes(sample_name, value)) +
        geom_violin() +
        labs(
        x = NULL, y = "size"
        ) +
        scale_y_continuous(trans = 'log10', limits = c(1, 1000))


        violin_plot 1
        for my whole dataset it looks like this:
        violin_plot with summary statistics
        I have added: summary statistic as red dots with errorbars.



        What do you think?







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 27 '18 at 14:16

























        answered Nov 26 '18 at 10:29









        DomeDome

        115




        115






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53469540%2ftransform-histogram-to-violin-plot-in-r-with-ggplot%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            404 Error Contact Form 7 ajax form submitting

            How to know if a Active Directory user can login interactively

            Refactoring coordinates for Minecraft Pi buildings written in Python