transform histogram to violin plot in r with ggplot

I am currently trying to learn r with the help of Hadley Wickham's great resources ("r for data scientists", "ggplot2 Elegant Graphics for Data Analysis"). So far I was able to find answers to all my problems there (thank you so much, Hadley!), but not this time.

Currently, I am working with data from an instrument that estimates particle size by the light the particles scatter (DLS, Zetasizer Nano, Malvern Instruments). The data extracted from this device are some summary statistics (e.g. mean particle size) and histogram data: x = size (split in bins), y = intensity [%].
Here is a tibble of one of my measurements:

   # A tibble: 70 x 3

   sample_name        intensities      bins

   <chr>                    <dbl>     <dbl>

 1 core formulation 1         0       0.4  

 2 core formulation 1         0       0.463

 3 core formulation 1         0       0.536

 4 core formulation 1         0       0.621

 5 core formulation 1         0       0.720

 6 core formulation 1         0       0.833

 7 core formulation 1         0       0.965

 8 core formulation 1         0       1.12 

 9 core formulation 1         0       1.29 

10 core formulation 1         0       1.50 

11 core formulation 1         0       1.74 

12 core formulation 1         0       2.01 

13 core formulation 1         0       2.33 

14 core formulation 1         0       2.70 

15 core formulation 1         0       3.12 

16 core formulation 1         0       3.62 

17 core formulation 1         0       4.19 

18 core formulation 1         0       4.85 

19 core formulation 1         0       5.62 

20 core formulation 1         0       6.50 

21 core formulation 1         0       7.53 

22 core formulation 1         0       8.72 

23 core formulation 1         0      10.1  

24 core formulation 1         0      11.7  

25 core formulation 1         0      13.5  

26 core formulation 1         0      15.7  

27 core formulation 1         0      18.2  

28 core formulation 1         0      21.0  

29 core formulation 1         0      24.4  

30 core formulation 1         0      28.2  

31 core formulation 1         0      32.7  

32 core formulation 1         0      37.8  

33 core formulation 1         0      43.8  

34 core formulation 1         0.2    50.8  

35 core formulation 1         1.4    58.8  

36 core formulation 1         3.7    68.1  

37 core formulation 1         6.9    78.8  

38 core formulation 1        10.2    91.3  

39 core formulation 1        12.9   106.   

40 core formulation 1        14.4   122.   

41 core formulation 1        14.4   142.   

42 core formulation 1        13     164.   

43 core formulation 1        10.3   190.   

44 core formulation 1         7.1   220.   

45 core formulation 1         3.9   255    

46 core formulation 1         1.5   295.   

47 core formulation 1         0.2   342    

48 core formulation 1         0     396.   

49 core formulation 1         0     459.   

50 core formulation 1         0     531.   

51 core formulation 1         0     615.   

52 core formulation 1         0     712.   

53 core formulation 1         0     825    

54 core formulation 1         0     955.   

55 core formulation 1         0    1106    

56 core formulation 1         0    1281    

57 core formulation 1         0    1484    

58 core formulation 1         0    1718    

59 core formulation 1         0    1990    

60 core formulation 1         0    2305    

61 core formulation 1         0    2669    

62 core formulation 1         0    3091    

63 core formulation 1         0    3580    

64 core formulation 1         0    4145    

65 core formulation 1         0    4801    

66 core formulation 1         0    5560    

67 core formulation 1         0    6439    

68 core formulation 1         0    7456    

69 core formulation 1         0    8635    

70 core formulation 1         0   10000

Here is the data produced with the dput() command:

structure(list(sample_name = c("core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1"), intensities = c(0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 1.4, 3.7, 6.9, 10.2, 12.9, 

14.4, 14.4, 13, 10.3, 7.1, 3.9, 1.5, 0.2, 0, 0, 0, 0, 0, 0, 0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), bins = c(0.4, 

0.4632, 0.5365, 0.6213, 0.7195, 0.8332, 0.9649, 1.117, 1.294, 

1.499, 1.736, 2.01, 2.328, 2.696, 3.122, 3.615, 4.187, 4.849, 

5.615, 6.503, 7.531, 8.721, 10.1, 11.7, 13.54, 15.69, 18.17, 

21.04, 24.36, 28.21, 32.67, 37.84, 43.82, 50.75, 58.77, 68.06, 

78.82, 91.28, 105.7, 122.4, 141.8, 164.2, 190.1, 220.2, 255, 

295.3, 342, 396.1, 458.7, 531.2, 615.1, 712.4, 825, 955.4, 1106, 

1281, 1484, 1718, 1990, 2305, 2669, 3091, 3580, 4145, 4801, 5560, 

6439, 7456, 8635, 10000)), class = c("tbl_df", "tbl", "data.frame"

), row.names = c(NA, -70L))

I can produce a histogram with no problems from this data:

library(tidyverse)

ggplot (DLS_intensities_core, aes(bins,intensities) ) + 

  geom_line() + 

  scale_x_continuous(trans = 'log10')

line plot, intensities against bins

In order to show the overall distribution of my particle size, I would like to convert this data into a violin plot and use the summary statistics provided by the device in a second layer of my plot.

Therefore, I would like to transform this data to be able to create a violin plot from it.

I have already tried feeding it to the stat_density () argument of the violin plot but so far with no success.

Do you know how to create a violin plot from this data?

Thank you very much!

Best,

Dominik

edited Nov 25 '18 at 17:01

asked Nov 25 '18 at 16:30

Dome

115

1

Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

– hrbrmstr
Nov 25 '18 at 16:33

I am sorry, I will add it immediately! thank you for pointing it out!

– Dome
Nov 25 '18 at 16:39

1

Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

– hrbrmstr
Nov 25 '18 at 17:30

Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

– Dome
Nov 25 '18 at 17:43

add a comment |

   # A tibble: 70 x 3

   sample_name        intensities      bins

   <chr>                    <dbl>     <dbl>

 1 core formulation 1         0       0.4  

 2 core formulation 1         0       0.463

 3 core formulation 1         0       0.536

 4 core formulation 1         0       0.621

 5 core formulation 1         0       0.720

 6 core formulation 1         0       0.833

 7 core formulation 1         0       0.965

 8 core formulation 1         0       1.12 

 9 core formulation 1         0       1.29 

10 core formulation 1         0       1.50 

11 core formulation 1         0       1.74 

12 core formulation 1         0       2.01 

13 core formulation 1         0       2.33 

14 core formulation 1         0       2.70 

15 core formulation 1         0       3.12 

16 core formulation 1         0       3.62 

17 core formulation 1         0       4.19 

18 core formulation 1         0       4.85 

19 core formulation 1         0       5.62 

20 core formulation 1         0       6.50 

21 core formulation 1         0       7.53 

22 core formulation 1         0       8.72 

23 core formulation 1         0      10.1  

24 core formulation 1         0      11.7  

25 core formulation 1         0      13.5  

26 core formulation 1         0      15.7  

27 core formulation 1         0      18.2  

28 core formulation 1         0      21.0  

29 core formulation 1         0      24.4  

30 core formulation 1         0      28.2  

31 core formulation 1         0      32.7  

32 core formulation 1         0      37.8  

33 core formulation 1         0      43.8  

34 core formulation 1         0.2    50.8  

35 core formulation 1         1.4    58.8  

36 core formulation 1         3.7    68.1  

37 core formulation 1         6.9    78.8  

38 core formulation 1        10.2    91.3  

39 core formulation 1        12.9   106.   

40 core formulation 1        14.4   122.   

41 core formulation 1        14.4   142.   

42 core formulation 1        13     164.   

43 core formulation 1        10.3   190.   

44 core formulation 1         7.1   220.   

45 core formulation 1         3.9   255    

46 core formulation 1         1.5   295.   

47 core formulation 1         0.2   342    

48 core formulation 1         0     396.   

49 core formulation 1         0     459.   

50 core formulation 1         0     531.   

51 core formulation 1         0     615.   

52 core formulation 1         0     712.   

53 core formulation 1         0     825    

54 core formulation 1         0     955.   

55 core formulation 1         0    1106    

56 core formulation 1         0    1281    

57 core formulation 1         0    1484    

58 core formulation 1         0    1718    

59 core formulation 1         0    1990    

60 core formulation 1         0    2305    

61 core formulation 1         0    2669    

62 core formulation 1         0    3091    

63 core formulation 1         0    3580    

64 core formulation 1         0    4145    

65 core formulation 1         0    4801    

66 core formulation 1         0    5560    

67 core formulation 1         0    6439    

68 core formulation 1         0    7456    

69 core formulation 1         0    8635    

70 core formulation 1         0   10000

Here is the data produced with the dput() command:

structure(list(sample_name = c("core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1"), intensities = c(0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 1.4, 3.7, 6.9, 10.2, 12.9, 

14.4, 14.4, 13, 10.3, 7.1, 3.9, 1.5, 0.2, 0, 0, 0, 0, 0, 0, 0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), bins = c(0.4, 

0.4632, 0.5365, 0.6213, 0.7195, 0.8332, 0.9649, 1.117, 1.294, 

1.499, 1.736, 2.01, 2.328, 2.696, 3.122, 3.615, 4.187, 4.849, 

5.615, 6.503, 7.531, 8.721, 10.1, 11.7, 13.54, 15.69, 18.17, 

21.04, 24.36, 28.21, 32.67, 37.84, 43.82, 50.75, 58.77, 68.06, 

78.82, 91.28, 105.7, 122.4, 141.8, 164.2, 190.1, 220.2, 255, 

295.3, 342, 396.1, 458.7, 531.2, 615.1, 712.4, 825, 955.4, 1106, 

1281, 1484, 1718, 1990, 2305, 2669, 3091, 3580, 4145, 4801, 5560, 

6439, 7456, 8635, 10000)), class = c("tbl_df", "tbl", "data.frame"

), row.names = c(NA, -70L))

I can produce a histogram with no problems from this data:

library(tidyverse)

ggplot (DLS_intensities_core, aes(bins,intensities) ) + 

  geom_line() + 

  scale_x_continuous(trans = 'log10')

line plot, intensities against bins

In order to show the overall distribution of my particle size, I would like to convert this data into a violin plot and use the summary statistics provided by the device in a second layer of my plot.

Therefore, I would like to transform this data to be able to create a violin plot from it.

I have already tried feeding it to the stat_density () argument of the violin plot but so far with no success.

Do you know how to create a violin plot from this data?

Thank you very much!

Best,

Dominik

edited Nov 25 '18 at 17:01

asked Nov 25 '18 at 16:30

Dome

115

1

Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

– hrbrmstr
Nov 25 '18 at 16:33

I am sorry, I will add it immediately! thank you for pointing it out!

– Dome
Nov 25 '18 at 16:39

1

Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

– hrbrmstr
Nov 25 '18 at 17:30

Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

– Dome
Nov 25 '18 at 17:43

add a comment |

   # A tibble: 70 x 3

   sample_name        intensities      bins

   <chr>                    <dbl>     <dbl>

 1 core formulation 1         0       0.4  

 2 core formulation 1         0       0.463

 3 core formulation 1         0       0.536

 4 core formulation 1         0       0.621

 5 core formulation 1         0       0.720

 6 core formulation 1         0       0.833

 7 core formulation 1         0       0.965

 8 core formulation 1         0       1.12 

 9 core formulation 1         0       1.29 

10 core formulation 1         0       1.50 

11 core formulation 1         0       1.74 

12 core formulation 1         0       2.01 

13 core formulation 1         0       2.33 

14 core formulation 1         0       2.70 

15 core formulation 1         0       3.12 

16 core formulation 1         0       3.62 

17 core formulation 1         0       4.19 

18 core formulation 1         0       4.85 

19 core formulation 1         0       5.62 

20 core formulation 1         0       6.50 

21 core formulation 1         0       7.53 

22 core formulation 1         0       8.72 

23 core formulation 1         0      10.1  

24 core formulation 1         0      11.7  

25 core formulation 1         0      13.5  

26 core formulation 1         0      15.7  

27 core formulation 1         0      18.2  

28 core formulation 1         0      21.0  

29 core formulation 1         0      24.4  

30 core formulation 1         0      28.2  

31 core formulation 1         0      32.7  

32 core formulation 1         0      37.8  

33 core formulation 1         0      43.8  

34 core formulation 1         0.2    50.8  

35 core formulation 1         1.4    58.8  

36 core formulation 1         3.7    68.1  

37 core formulation 1         6.9    78.8  

38 core formulation 1        10.2    91.3  

39 core formulation 1        12.9   106.   

40 core formulation 1        14.4   122.   

41 core formulation 1        14.4   142.   

42 core formulation 1        13     164.   

43 core formulation 1        10.3   190.   

44 core formulation 1         7.1   220.   

45 core formulation 1         3.9   255    

46 core formulation 1         1.5   295.   

47 core formulation 1         0.2   342    

48 core formulation 1         0     396.   

49 core formulation 1         0     459.   

50 core formulation 1         0     531.   

51 core formulation 1         0     615.   

52 core formulation 1         0     712.   

53 core formulation 1         0     825    

54 core formulation 1         0     955.   

55 core formulation 1         0    1106    

56 core formulation 1         0    1281    

57 core formulation 1         0    1484    

58 core formulation 1         0    1718    

59 core formulation 1         0    1990    

60 core formulation 1         0    2305    

61 core formulation 1         0    2669    

62 core formulation 1         0    3091    

63 core formulation 1         0    3580    

64 core formulation 1         0    4145    

65 core formulation 1         0    4801    

66 core formulation 1         0    5560    

67 core formulation 1         0    6439    

68 core formulation 1         0    7456    

69 core formulation 1         0    8635    

70 core formulation 1         0   10000

Here is the data produced with the dput() command:

structure(list(sample_name = c("core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1"), intensities = c(0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 1.4, 3.7, 6.9, 10.2, 12.9, 

14.4, 14.4, 13, 10.3, 7.1, 3.9, 1.5, 0.2, 0, 0, 0, 0, 0, 0, 0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), bins = c(0.4, 

0.4632, 0.5365, 0.6213, 0.7195, 0.8332, 0.9649, 1.117, 1.294, 

1.499, 1.736, 2.01, 2.328, 2.696, 3.122, 3.615, 4.187, 4.849, 

5.615, 6.503, 7.531, 8.721, 10.1, 11.7, 13.54, 15.69, 18.17, 

21.04, 24.36, 28.21, 32.67, 37.84, 43.82, 50.75, 58.77, 68.06, 

78.82, 91.28, 105.7, 122.4, 141.8, 164.2, 190.1, 220.2, 255, 

295.3, 342, 396.1, 458.7, 531.2, 615.1, 712.4, 825, 955.4, 1106, 

1281, 1484, 1718, 1990, 2305, 2669, 3091, 3580, 4145, 4801, 5560, 

6439, 7456, 8635, 10000)), class = c("tbl_df", "tbl", "data.frame"

), row.names = c(NA, -70L))

I can produce a histogram with no problems from this data:

library(tidyverse)

ggplot (DLS_intensities_core, aes(bins,intensities) ) + 

  geom_line() + 

  scale_x_continuous(trans = 'log10')

line plot, intensities against bins

In order to show the overall distribution of my particle size, I would like to convert this data into a violin plot and use the summary statistics provided by the device in a second layer of my plot.

Therefore, I would like to transform this data to be able to create a violin plot from it.

I have already tried feeding it to the stat_density () argument of the violin plot but so far with no success.

Do you know how to create a violin plot from this data?

Thank you very much!

Best,

Dominik

edited Nov 25 '18 at 17:01

asked Nov 25 '18 at 16:30

Dome

115

   # A tibble: 70 x 3

   sample_name        intensities      bins

   <chr>                    <dbl>     <dbl>

 1 core formulation 1         0       0.4  

 2 core formulation 1         0       0.463

 3 core formulation 1         0       0.536

 4 core formulation 1         0       0.621

 5 core formulation 1         0       0.720

 6 core formulation 1         0       0.833

 7 core formulation 1         0       0.965

 8 core formulation 1         0       1.12 

 9 core formulation 1         0       1.29 

10 core formulation 1         0       1.50 

11 core formulation 1         0       1.74 

12 core formulation 1         0       2.01 

13 core formulation 1         0       2.33 

14 core formulation 1         0       2.70 

15 core formulation 1         0       3.12 

16 core formulation 1         0       3.62 

17 core formulation 1         0       4.19 

18 core formulation 1         0       4.85 

19 core formulation 1         0       5.62 

20 core formulation 1         0       6.50 

21 core formulation 1         0       7.53 

22 core formulation 1         0       8.72 

23 core formulation 1         0      10.1  

24 core formulation 1         0      11.7  

25 core formulation 1         0      13.5  

26 core formulation 1         0      15.7  

27 core formulation 1         0      18.2  

28 core formulation 1         0      21.0  

29 core formulation 1         0      24.4  

30 core formulation 1         0      28.2  

31 core formulation 1         0      32.7  

32 core formulation 1         0      37.8  

33 core formulation 1         0      43.8  

34 core formulation 1         0.2    50.8  

35 core formulation 1         1.4    58.8  

36 core formulation 1         3.7    68.1  

37 core formulation 1         6.9    78.8  

38 core formulation 1        10.2    91.3  

39 core formulation 1        12.9   106.   

40 core formulation 1        14.4   122.   

41 core formulation 1        14.4   142.   

42 core formulation 1        13     164.   

43 core formulation 1        10.3   190.   

44 core formulation 1         7.1   220.   

45 core formulation 1         3.9   255    

46 core formulation 1         1.5   295.   

47 core formulation 1         0.2   342    

48 core formulation 1         0     396.   

49 core formulation 1         0     459.   

50 core formulation 1         0     531.   

51 core formulation 1         0     615.   

52 core formulation 1         0     712.   

53 core formulation 1         0     825    

54 core formulation 1         0     955.   

55 core formulation 1         0    1106    

56 core formulation 1         0    1281    

57 core formulation 1         0    1484    

58 core formulation 1         0    1718    

59 core formulation 1         0    1990    

60 core formulation 1         0    2305    

61 core formulation 1         0    2669    

62 core formulation 1         0    3091    

63 core formulation 1         0    3580    

64 core formulation 1         0    4145    

65 core formulation 1         0    4801    

66 core formulation 1         0    5560    

67 core formulation 1         0    6439    

68 core formulation 1         0    7456    

69 core formulation 1         0    8635    

70 core formulation 1         0   10000

Here is the data produced with the dput() command:

structure(list(sample_name = c("core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1", "core formulation 1", 

"core formulation 1", "core formulation 1"), intensities = c(0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 1.4, 3.7, 6.9, 10.2, 12.9, 

14.4, 14.4, 13, 10.3, 7.1, 3.9, 1.5, 0.2, 0, 0, 0, 0, 0, 0, 0, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), bins = c(0.4, 

0.4632, 0.5365, 0.6213, 0.7195, 0.8332, 0.9649, 1.117, 1.294, 

1.499, 1.736, 2.01, 2.328, 2.696, 3.122, 3.615, 4.187, 4.849, 

5.615, 6.503, 7.531, 8.721, 10.1, 11.7, 13.54, 15.69, 18.17, 

21.04, 24.36, 28.21, 32.67, 37.84, 43.82, 50.75, 58.77, 68.06, 

78.82, 91.28, 105.7, 122.4, 141.8, 164.2, 190.1, 220.2, 255, 

295.3, 342, 396.1, 458.7, 531.2, 615.1, 712.4, 825, 955.4, 1106, 

1281, 1484, 1718, 1990, 2305, 2669, 3091, 3580, 4145, 4801, 5560, 

6439, 7456, 8635, 10000)), class = c("tbl_df", "tbl", "data.frame"

), row.names = c(NA, -70L))

I can produce a histogram with no problems from this data:

library(tidyverse)

ggplot (DLS_intensities_core, aes(bins,intensities) ) + 

  geom_line() + 

  scale_x_continuous(trans = 'log10')

line plot, intensities against bins

In order to show the overall distribution of my particle size, I would like to convert this data into a violin plot and use the summary statistics provided by the device in a second layer of my plot.

Therefore, I would like to transform this data to be able to create a violin plot from it.

I have already tried feeding it to the stat_density () argument of the violin plot but so far with no success.

Do you know how to create a violin plot from this data?

Thank you very much!

Best,

Dominik

r ggplot2 histogram transformation violin-plot

edited Nov 25 '18 at 17:01

asked Nov 25 '18 at 16:30

Dome

115

edited Nov 25 '18 at 17:01

asked Nov 25 '18 at 16:30

Dome

115

edited Nov 25 '18 at 17:01

asked Nov 25 '18 at 16:30

Dome

115

asked Nov 25 '18 at 16:30

Dome

115

asked Nov 25 '18 at 16:30

Dome

115

1

Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

– hrbrmstr
Nov 25 '18 at 16:33

I am sorry, I will add it immediately! thank you for pointing it out!

– Dome
Nov 25 '18 at 16:39

1

Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

– hrbrmstr
Nov 25 '18 at 17:30

Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

– Dome
Nov 25 '18 at 17:43

add a comment |

1

Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

– hrbrmstr
Nov 25 '18 at 16:33

I am sorry, I will add it immediately! thank you for pointing it out!

– Dome
Nov 25 '18 at 16:39

1

Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

– hrbrmstr
Nov 25 '18 at 17:30

Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

– Dome
Nov 25 '18 at 17:43

Welcome to SO! Could you jump over to stackoverflow.com/tags/r/info and take a look at the suggestions for how to post data (some are in links off of that URL)? I think the SO R contributors could make quick work of this if the data was in a tad easier format to work with.

– hrbrmstr
Nov 25 '18 at 16:33

I am sorry, I will add it immediately! thank you for pointing it out!

– Dome
Nov 25 '18 at 16:39

Wow! Best data dump of the day here! (and, no apologies are required. the SO question entry screen leaves much to be desired). Are you just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities or are there some other transformations you needed performed on them?

– hrbrmstr
Nov 25 '18 at 17:30

Thanks :-) I think I am just looking for a violin plot of DLS_intensities_core$bins and DLS_intensities_core$intensities. In the end, I want to produce a plot with the following specs: x-axis: particle size [nm]; y-axis: sample name (e.g. "core formulation"); 1st. layer: violin plot (from the data I presented here); 2nd. layer: summary statistics (calculated by the Zetasizer, not shown here for clarity) That means I will need to do some additional transformations for sure, but I want to try it myself first ;-)

– Dome
Nov 25 '18 at 17:43

add a comment |

2 Answers
2

active

oldest

votes

I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:

library(hrbrthemes)



gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

I generally like to layer in the points as well:

gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom() +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:

library(hrbrthemes)

library(tidyverse)



ggplot(DLS_intensities_core, aes(x="", bins)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom(

    aes(size = intensities, fill = intensities), shape = 21

  ) +

  scale_y_comma(trans="log10") +

  viridis::scale_fill_viridis(direction = -1, trans = "log1p") +

  scale_size_continuous(trans = "log1p", range = c(2, 10)) +

  guides(fill = guide_legend()) +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this"

  ) +

  theme_ipsum_rc(grid="Y")

enter image description here

You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).

edited Nov 25 '18 at 18:22

answered Nov 25 '18 at 17:36

hrbrmstr

61.3k691152

thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

– Dome
Nov 25 '18 at 18:11

See if the latest update does a bit more of what you want.

– hrbrmstr
Nov 25 '18 at 18:23

yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

– Dome
Nov 26 '18 at 9:18

I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

– Dome
Nov 26 '18 at 9:18

add a comment |

I found a solution to my problem, it is probably not very elegant:

library (tidyverse)



DLS_intensities_core <- DLS_intensities_core %>% 

  mutate(counts = intensities * 10 )



vectors <- DLS_intensities_core %>%

  filter(counts > 0) 



bins_v <- vectors$bins

count_v <- vectors$counts



violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))

violin_DLSdata$sample_name <- "core formulation 1"



ggplot (violin_DLSdata, aes(sample_name, value)) + 

  geom_violin() + 

  labs(

    x = NULL, y = "size"

  ) +

  scale_y_continuous(trans = 'log10', limits = c(1, 1000))

violin_plot 1
for my whole dataset it looks like this:
violin_plot with summary statistics
I have added: summary statistic as red dots with errorbars.

What do you think?

edited Nov 27 '18 at 14:16

answered Nov 26 '18 at 10:29

Dome

115

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53469540%2ftransform-histogram-to-violin-plot-in-r-with-ggplot%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:

library(hrbrthemes)



gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

I generally like to layer in the points as well:

gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom() +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:

library(hrbrthemes)

library(tidyverse)



ggplot(DLS_intensities_core, aes(x="", bins)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom(

    aes(size = intensities, fill = intensities), shape = 21

  ) +

  scale_y_comma(trans="log10") +

  viridis::scale_fill_viridis(direction = -1, trans = "log1p") +

  scale_size_continuous(trans = "log1p", range = c(2, 10)) +

  guides(fill = guide_legend()) +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this"

  ) +

  theme_ipsum_rc(grid="Y")

enter image description here

You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).

edited Nov 25 '18 at 18:22

answered Nov 25 '18 at 17:36

hrbrmstr

61.3k691152

thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

– Dome
Nov 25 '18 at 18:11

See if the latest update does a bit more of what you want.

– hrbrmstr
Nov 25 '18 at 18:23

yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

– Dome
Nov 26 '18 at 9:18

I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

– Dome
Nov 26 '18 at 9:18

add a comment |

I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:

library(hrbrthemes)



gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

I generally like to layer in the points as well:

gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom() +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:

library(hrbrthemes)

library(tidyverse)



ggplot(DLS_intensities_core, aes(x="", bins)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom(

    aes(size = intensities, fill = intensities), shape = 21

  ) +

  scale_y_comma(trans="log10") +

  viridis::scale_fill_viridis(direction = -1, trans = "log1p") +

  scale_size_continuous(trans = "log1p", range = c(2, 10)) +

  guides(fill = guide_legend()) +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this"

  ) +

  theme_ipsum_rc(grid="Y")

enter image description here

You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).

edited Nov 25 '18 at 18:22

answered Nov 25 '18 at 17:36

hrbrmstr

61.3k691152

thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

– Dome
Nov 25 '18 at 18:11

See if the latest update does a bit more of what you want.

– hrbrmstr
Nov 25 '18 at 18:23

yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

– Dome
Nov 26 '18 at 9:18

I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

– Dome
Nov 26 '18 at 9:18

add a comment |

I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:

library(hrbrthemes)



gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

I generally like to layer in the points as well:

gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom() +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:

library(hrbrthemes)

library(tidyverse)



ggplot(DLS_intensities_core, aes(x="", bins)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom(

    aes(size = intensities, fill = intensities), shape = 21

  ) +

  scale_y_comma(trans="log10") +

  viridis::scale_fill_viridis(direction = -1, trans = "log1p") +

  scale_size_continuous(trans = "log1p", range = c(2, 10)) +

  guides(fill = guide_legend()) +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this"

  ) +

  theme_ipsum_rc(grid="Y")

enter image description here

You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).

edited Nov 25 '18 at 18:22

answered Nov 25 '18 at 17:36

hrbrmstr

61.3k691152

I'll update this (if needed) after your reply to the second comment. You can get violin plots of both bins and intensities with:

library(hrbrthemes)



gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

I generally like to layer in the points as well:

gather(DLS_intensities_core, measure, value, -sample_name) %>% 

  ggplot(aes(measure, value)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom() +

  scale_y_comma() +

  facet_wrap(~measure, scales="free") +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this",

    caption = "NOTE: Free Y scales"

  ) +

  theme_ipsum_rc(grid="Y") +

  theme(axis.text.x = element_blank())

enter image description here

Per your comment, perhaps this might be a better way to show the bins distribution along with the relationship with intensities:

library(hrbrthemes)

library(tidyverse)



ggplot(DLS_intensities_core, aes(x="", bins)) +

  geom_violin(scale = "count") +

  ggbeeswarm::geom_quasirandom(

    aes(size = intensities, fill = intensities), shape = 21

  ) +

  scale_y_comma(trans="log10") +

  viridis::scale_fill_viridis(direction = -1, trans = "log1p") +

  scale_size_continuous(trans = "log1p", range = c(2, 10)) +

  guides(fill = guide_legend()) +

  labs(

    x = NULL, y = "A better label than this",

    title = "A better title than this"

  ) +

  theme_ipsum_rc(grid="Y")

enter image description here

You'd have to do some other, custom transformation to try to get the violin shape to vary with the intensities (and it wouldn't really be reflecting the distribution at that point).

edited Nov 25 '18 at 18:22

answered Nov 25 '18 at 17:36

hrbrmstr

61.3k691152

edited Nov 25 '18 at 18:22

answered Nov 25 '18 at 17:36

hrbrmstr

61.3k691152

answered Nov 25 '18 at 17:36

hrbrmstr

61.3k691152

answered Nov 25 '18 at 17:36

hrbrmstr

61.3k691152

thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

– Dome
Nov 25 '18 at 18:11

See if the latest update does a bit more of what you want.

– hrbrmstr
Nov 25 '18 at 18:23

yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

– Dome
Nov 26 '18 at 9:18

I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

– Dome
Nov 26 '18 at 9:18

add a comment |

thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

– Dome
Nov 25 '18 at 18:11

See if the latest update does a bit more of what you want.

– hrbrmstr
Nov 25 '18 at 18:23

yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

– Dome
Nov 26 '18 at 9:18

I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

– Dome
Nov 26 '18 at 9:18

thank you very much for your plots and the associated code, I like the theme of your plots very much! And the addition of dots to violin plots is a good idea :-) The first plot (bins) is really close to what I want to have: Size (or bins) on the y -axis (in log10 scale) and the expansion in x-direction of the plot is the value of the intensity (not the number of cases). Does this help or is it just the wrong plot for this idea?

– Dome
Nov 25 '18 at 18:11

See if the latest update does a bit more of what you want.

– hrbrmstr
Nov 25 '18 at 18:23

yes, this goes in the right direction! However, I would like the intensities to be depicted by the extension of the violin plot in x direction (in a way that the biggest extension of the violin plot is at around 100 on the y-axis, at the position that you printed the big dots at.).

– Dome
Nov 26 '18 at 9:18

I had another thought while in the shower: Maybe my tibble is not in the right format for this type of plot. Since the violin plot's stat counts rows and depicts a summary, the most straight forward way would be to multiplay the bin size by intensity and create the corresponding amount of rows with this precise (bin-) size. E.g. if the size is 100 and the corresponding intensity 20, then I would create 20 new rows with bin = 100. I will try this approach today and let you know what I found :-) Thanks again for your help! :-)

– Dome
Nov 26 '18 at 9:18

add a comment |

I found a solution to my problem, it is probably not very elegant:

library (tidyverse)



DLS_intensities_core <- DLS_intensities_core %>% 

  mutate(counts = intensities * 10 )



vectors <- DLS_intensities_core %>%

  filter(counts > 0) 



bins_v <- vectors$bins

count_v <- vectors$counts



violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))

violin_DLSdata$sample_name <- "core formulation 1"



ggplot (violin_DLSdata, aes(sample_name, value)) + 

  geom_violin() + 

  labs(

    x = NULL, y = "size"

  ) +

  scale_y_continuous(trans = 'log10', limits = c(1, 1000))

violin_plot 1
for my whole dataset it looks like this:
violin_plot with summary statistics
I have added: summary statistic as red dots with errorbars.

What do you think?

edited Nov 27 '18 at 14:16

answered Nov 26 '18 at 10:29

Dome

115

add a comment |

I found a solution to my problem, it is probably not very elegant:

library (tidyverse)



DLS_intensities_core <- DLS_intensities_core %>% 

  mutate(counts = intensities * 10 )



vectors <- DLS_intensities_core %>%

  filter(counts > 0) 



bins_v <- vectors$bins

count_v <- vectors$counts



violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))

violin_DLSdata$sample_name <- "core formulation 1"



ggplot (violin_DLSdata, aes(sample_name, value)) + 

  geom_violin() + 

  labs(

    x = NULL, y = "size"

  ) +

  scale_y_continuous(trans = 'log10', limits = c(1, 1000))

violin_plot 1
for my whole dataset it looks like this:
violin_plot with summary statistics
I have added: summary statistic as red dots with errorbars.

What do you think?

edited Nov 27 '18 at 14:16

answered Nov 26 '18 at 10:29

Dome

115

add a comment |

I found a solution to my problem, it is probably not very elegant:

library (tidyverse)



DLS_intensities_core <- DLS_intensities_core %>% 

  mutate(counts = intensities * 10 )



vectors <- DLS_intensities_core %>%

  filter(counts > 0) 



bins_v <- vectors$bins

count_v <- vectors$counts



violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))

violin_DLSdata$sample_name <- "core formulation 1"



ggplot (violin_DLSdata, aes(sample_name, value)) + 

  geom_violin() + 

  labs(

    x = NULL, y = "size"

  ) +

  scale_y_continuous(trans = 'log10', limits = c(1, 1000))

violin_plot 1
for my whole dataset it looks like this:
violin_plot with summary statistics
I have added: summary statistic as red dots with errorbars.

What do you think?

edited Nov 27 '18 at 14:16

answered Nov 26 '18 at 10:29

Dome

115

I found a solution to my problem, it is probably not very elegant:

library (tidyverse)



DLS_intensities_core <- DLS_intensities_core %>% 

  mutate(counts = intensities * 10 )



vectors <- DLS_intensities_core %>%

  filter(counts > 0) 



bins_v <- vectors$bins

count_v <- vectors$counts



violin_DLSdata <- as.tibble(rep.int(bins_v, count_v))

violin_DLSdata$sample_name <- "core formulation 1"



ggplot (violin_DLSdata, aes(sample_name, value)) + 

  geom_violin() + 

  labs(

    x = NULL, y = "size"

  ) +

  scale_y_continuous(trans = 'log10', limits = c(1, 1000))

violin_plot 1
for my whole dataset it looks like this:
violin_plot with summary statistics
I have added: summary statistic as red dots with errorbars.

What do you think?

edited Nov 27 '18 at 14:16

answered Nov 26 '18 at 10:29

Dome

115

edited Nov 27 '18 at 14:16

answered Nov 26 '18 at 10:29

Dome

115

answered Nov 26 '18 at 10:29

Dome

115

answered Nov 26 '18 at 10:29

Dome

115

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

4,vD29o3hI1vz0H BBGlrd0nurkPTJvK,GY HLhm4bVNS G6e1zNi05X,rMBshe7ZE 6,pXkzJldAX2OHC lOGXUIxGuH9LiKeovjG5Y

搜尋此網誌

Tukukkk