2017-09-27 Meeting notes - JGI data science progress check in

Date

27 Sep 2017

Attendees

Goals

To follow up on the analysis performed by everyone, and give feedback to each other in preparation

Discussion items

Time	Item	Who	Notes
5min	Agree on scope of the meeting. For everyone to present the results they have from examining the data science survey results, even if they are still incomplete.	Mike Barton	Everyone agree
6min	Presenting progress so far	Kjiersten Meri Fagnan	Disparate data sources Good data sources are those that are well documented, the survey results bear this out People have a lot of interesting questions they'd like to answer
5min	Everyone provide comments and suggestions on Kjiersten's progress	Mike Barton	Bill mentions what will be the best way to achieve success with the least effort? Possibly document a way to find data. A group at the JGI that is responsible for data stewardship, because data gets out of date very quickly. Individuals responsible for this would manage this. This would be where people would begin by asking questions.
6min	Present progress so far	William (Bill) B Andreopoulos	Most people are interested visualisation tools. How to visualise data, and how to make nice visualisations. Perhaps more emphasis on data, so people can consistently make nice visualisations Too many different data sources and so people get data from different places. There isn't a single data source for all data. 71% want to visualise and display data Only 51% of people want to visualise data Many people are complaining about JAMO that has missing data People cannot get their work done reproducibly and they use hacks that worked for someone else. They don't have a way to repeat what they did last time. Summary Most people are more interested in visualisaton Too many disparate data sources, and don't know how they are integrated Many times data is missing from JAMO Many times cannot reproduce how they found or accessed the data they use for their analysis.
5min	Everyone provide comments and suggestions on Bill's progress	Mike Barton	Simona has seen that people access ITS data from secondary sources and data gets stale and the data gets stale and out of date. Probably many cases that happen at the JGI Possible data access through web services There are commercial tools that do data source integration Who is the owner of what data, and finding out where the data comes from is a bigger problem. No one knows where to go to get the data they want. E.g. getting GOLD data from JAMO etc.
6min	Present progress so far	Kecia M Duffy	In future collect numerical answers for the questions. Everyone could choose "most interested" so there is no ranking. About a third of the people for learning styles choose most interested. Perhaps could choose ranking in future. No method of learning preferred, everyone choose all the same methods. Have to provide the data in different ways. How we pursue the training may depend on how which topic is being given. We can wait to see what methods and in-house resources we have for this.
5min	Everyone provide comments and suggestions on Kecia's progress	Mike Barton	The decisions of what to start with is going to drive what we choose or what. Michael gave the wrong goal to Kecia. Methods for teaching people the most useful skills in the style that will fit most people Countering this, according to the survey results most styles will fit most people. Have to have a target for what are we trying to learn Need to structure a syllabus for people, other methods can be auxillary support for this. Knowing who people can talk to will be very helpful too. Some bioinformatics that give an overview of the skills people need. People wouldn't need to did into big statistics Could we generate a curated set of resources and backgroud for people to use. Focused target
6min	Present progress so far	Mike Barton	Confluence page with analysed results - Analysis of JGI data science survey
5min	Everyone provide comments and suggestions on Bill's progress	Mike Barton	Bill suggested problem based learning Have to focus problems on what is specific to solving problems related to the JGI Kjiersten will find it articulate that this should be a priority if we can't related this to what people are doing at the JGI. Some people are already taking classes themselves. Workshops may useful, but we need to justify and follow up and that people are using these skills in their daily skills. Staff who aren't connected to directly sequence data would still want to learn data science skills.
3min	Determine next step action items for presenting at all hands	Mike Barton	Have a presentation in first draft for next meeting Key points for presentation There are a lot of people who are motivated improve their data analysis skills (visualisation) People want to know where to get data at the JGI People don't feel they have have time for this or that it is a priority. It will be going forward, this will become part of people's jobs.
3min	Retrospective on meeting and presenting results	Mike Barton

Action items

William (Bill) B Andreopoulos - prepare slides related to "people at the JGI are motivated improve their data analysis skills" with Mike Barton
Kjiersten Meri Fagnan - prepare slides related to "people don't feel they have have time for this or that it is a priority. It will be going forward, this will become part of people's jobs."
Kecia M Duffy - prepare slides related to "people want to know where to get data at the JGI" with Simona F Necula
Simona F Necula - prepare slides related to "people want to know where to get data at the JGI" with Kecia M Duffy
Mike Barton - prepare slides related to "people at the JGI are motivated improve their data analysis skills" with William (Bill) B Andreopoulos
Mike Barton - create three follow up meetings for prior to the all-hands meeting

Notes / Minutes

Observations from the survey results:

Visualize and stats -- these are two sides of the same coin. 71% of respondents want better consistent data visualization (JMP or R) ; 51% want to have a better understanding of the maths and stats.

There are too many disparate data sources. There is duplication between data sources. It is not clear where/what to query for data. It would be great to have a single API that provides a view of all data sources.

JAMO has missing data very often -- it is very confusing when values are missing.

Reproducibility (and documentation): how to repeat JAMO queries? How to reproduce analyses? Instead of resorting to hacks for querying JAMO, which may change from one time to another, it would be better to have a reproducible way to query JAMO.

Documentation: it seems that there is a culture at JGI to go ask the "expert" on a subject every time something is needed. There needs to be a way to document the knowledge. An expert source (wiki) of knowledge is needed where knowledge is documented and people can go to for answers and reproducing their queries.

Space shortcuts

Page tree