|2min||Review agenda||Mike Barton|
|5min||Introduction and motivation for getting involved||Jonathon (Jon) Bertsch|
Provide feed forward on the slides generated for the all hands.
|David E Gilbert|
|10min||Open discussion of preparing the slides for the all hands||Mike Barton|
|10min||Review points raised in the all hands meeting 10/17||Mike Barton|
|10min||Plan next steps for implementation data science at the JGI||Mike Barton|
Bill and Simona
Jon and Kecia
|5min||Review action items and plan next meeting||Jon will be out of the office the next couple of weeks|
- Anthony James Wildish will create a slack channel for the data science initiative
- William (Bill) B Andreopoulos will follow up with Leila
- William (Bill) B Andreopoulos and Simona F Necula will develop an action plan for the educational aspect of the data science drive
- Kecia M Duffy and Jonathon (Jon) Bertsch will develop an action plan for access to data sources and documentation
Notes from the all hands meeting held 2017/10/17
use of slack is proposed for communicating what is happening.
people want to share their scripts and not just news on who does what. Especially with purge policy it is common for scripts to get lost.
- visualization tools used widely, but there is little communication about who does what.
- wikis - jgi has 4 wikis but it is not certain that all are used as they are meant to be, since people dont know where they have to go for info. what we need is something like search
- everyone uses the bash shell widely.
- ML is a controversial topic at JGI - statistical learning is popular, but ML tools are a black box. It is not clear why their predictions might fail.
- Need for better documentation is agreed upon.
- for documentation a good idea would be during the quarterly NERSC maintenance to have a meeting where people document their tasks and code (in a wiki).
- Kaggle is a competition for training. But if everyone participates they will motivate one another.
- Any training has to be relevant to work at JGI, or else people won't find time for it and will lose interest.
please don't use wikis for code examples or scripts, please use a git repository. Wikis are harder to maintain, you can't beat a 'git commit' once a script works as compared to going to browser, open page, log in, edit, copy/paste, save... Plus, good scripts will evolve, and it's important to have version control to know when it breaks for someone or when a new feature gets added.
- regarding data competitions, kaggle-style or otherwise. I'd be careful about how these are introduced. If they're done on an individual basis, there's bound to be a lot of people who won't participate because they're shy. Make it group-oriented, either by JGI groups or by ad-hoc groups or something like that. I can help with the gamification if you like.
- Bill's slides mentioned making data easier to find in JAMO. What's needed there is a basic ontology that the JGI can adopt and adhere to. I'd like to work on that when I get some spare time, it's key to a lot of things that need doing.
- For group education, how about signing up for Coursera courses as a team and scheduling classes? We could have one day for watching the weeks videos (maybe over lunch), then a few days later a get-together to go over the homework. That gives people time to try them on their own and then learn as a crowd. Plus there's the course fora which are an invaluable source of help - there's no reason for us to do this alone!