Viewable by the world

Date

Attendees

Goals

  • To follow up on the analysis performed by everyone, and give feedback to each other in preparation

Discussion items

TimeItemWhoNotes
5minAgree on scope of the meeting. For everyone to present the results they have from examining the data science survey results, even if they are still incomplete.Mike Barton
  • Everyone agree

 

6minPresenting progress so farKjiersten Meri Fagnan
  • Disparate data sources
  • Good data sources are those that are well documented, the survey results bear this out
  • People have a lot of interesting questions they'd like to answer
5minEveryone provide comments and suggestions on Kjiersten's progressMike Barton
  • Bill mentions what will be the best way to achieve success with the least effort? Possibly document a way to find data.
  • A group at the JGI that is responsible for data stewardship, because data gets out of date very quickly. Individuals responsible for this would manage this.
  • This would be where people would begin by asking questions.
6minPresent progress so farWilliam (Bill) B Andreopoulos
  • Most people are interested visualisation tools. How to visualise data, and how to make nice visualisations.
  • Perhaps more emphasis on data, so people can consistently make nice visualisations
  • Too many different data sources and so people get data from different places.
  • There isn't a single data source for all data.
  • 71% want to visualise and display data
  • Only 51% of people want to visualise data
  • Many people are complaining about JAMO that has missing data
  • People cannot get their work done reproducibly and they use hacks that worked for someone else. They don't have a way to repeat what they did last time.

Summary

  • Most people are more interested in visualisaton
  • Too many disparate data sources, and don't know how they are integrated
  • Many times data is missing from JAMO
  • Many times cannot reproduce how they found or accessed the data they use for their analysis.
5minEveryone provide comments and suggestions on Bill's progressMike Barton
  • Simona has seen that people access ITS data from secondary sources and data gets stale and the data gets stale and out of date.
  • Probably many cases that happen at the JGI
  • Possible data access through web services
  • There are commercial tools that do data source integration
  • Who is the owner of what data, and finding out where the data comes from is a bigger problem. No one knows where to go to get the data they want. E.g. getting GOLD data from JAMO etc.
6minPresent progress so farKecia M Duffy
  • In future collect numerical answers for the questions. Everyone could choose "most interested" so there is no ranking. About a third of the people for learning styles choose most interested. Perhaps could choose ranking in future.
  • No method of learning preferred, everyone choose all the same methods.
  • Have to provide the data in different ways.
  • How we pursue the training may depend on how which topic is being given.
  • We can wait to see what methods and in-house resources we have for this.
5minEveryone provide comments and suggestions on Kecia's progressMike Barton
  • The decisions of what to start with is going to drive what we choose or what.
  • Michael gave the wrong goal to Kecia.
  • Methods for teaching people the most useful skills in the style that will fit most people
  • Countering this, according to the survey results most styles will fit most people.
  • Have to have a target for what are we trying to learn
  • Need to structure a syllabus for people, other methods can be auxillary support for this.
  • Knowing who people can talk to will be very helpful too.
  • Some bioinformatics that give an overview of the skills people need. People wouldn't need to did into big statistics
  • Could we generate a curated set of resources and backgroud for people to use.
  • Focused target
6minPresent progress so farMike BartonConfluence page with analysed results - Analysis of JGI data science survey
5minEveryone provide comments and suggestions on Bill's progressMike Barton
  • Bill suggested problem based learning
  • Have to focus problems on what is specific to solving problems related to the JGI
  • Kjiersten will find it articulate that this should be a priority if we can't related this to what people are doing at the JGI.
  • Some people are already taking classes themselves.
  • Workshops may useful, but we need to justify and follow up and that people are using these skills in their daily skills.
  • Staff who aren't connected to directly sequence data would still want to learn data science skills.
3minDetermine next step action items for presenting at all handsMike Barton
  • Have a presentation in first draft for next meeting

Key points for presentation

  • There are a lot of people who are motivated improve their data analysis skills (visualisation)
  • People want to know where to get data at the JGI
  • People don't feel they have have time for this or that it is a priority. It will be going forward, this will become part of people's jobs.
3minRetrospective on meeting and presenting resultsMike Barton 

Action items

 

Notes / Minutes

Observations from the survey results:

Visualize and stats -- these are two sides of the same coin. 71% of respondents want better consistent data visualization (JMP or R) ; 51% want to have a better understanding of the maths and stats.

There are too many disparate data sources. There is duplication between data sources. It is not clear where/what to query for data. It would be great to have a single API that provides a view of all data sources.
JAMO has missing data very often -- it is very confusing when values are missing.
Reproducibility (and documentation): how to repeat JAMO queries? How to reproduce analyses? Instead of resorting to hacks for querying JAMO, which may change from one time to another, it would be better to have a reproducible way to query JAMO.
Documentation: it seems that there is a culture at JGI to go ask the "expert" on a subject every time something is needed. There needs to be a way to document the knowledge. An expert source (wiki) of knowledge is needed where knowledge is documented and people can go to for answers and reproducing their queries.