Insights from Open Data Science Conference 2021 West
I attended the November 2021 Open Data Science Conference (ODSC) West in San Francisco, California (held November 15th through November 18th). ODSC is an annual tech conference that gathers leading thinkers and experts in the fields of data science, artificial intelligence, and data operations. In this article, I'll highlight some of the concepts and ideas shared at this year's conference.
DataOps for the Modern Computer Vision Stack
Source: Slides
Data Advocate James Le introduces ML empowered DataOps workflow and the benefits in Computer Vision
In computer vision, DataOps means building a high-quality training dataset. This entails asking questions such as: What data? Where to find that data? How much data? Where to validate that data? What defines quality? Where to store that data? How to organize that data?
DataOps focuses on improving the workflows of data systems and analytic models used by data analysts and data engineers. Using machine learning to develop ideal practices for big data systems, organizations can make core workflows better, faster, and cheaper.
In his presentation, James Le described how DataOps practices fit perfectly within the Computer Vision context and the ways which systems can be composed into platforms to provide value for computer vision organization. James closed his presentation by describing the future of modern machine learning systems and introducing the Canonical Stack (CS) methodology as a blueprint to deliver better ML infrastructure in the future.
Machine Learning With Graphs: Going Beyond Tabular Data
Source: GitHub
Data Science Advocate Clair J. Sullivan introduces Graph Theory and how it can be applied to Machine Learning challenges
... by considering the relationships among those individual data points, models can be significantly enhanced and measurable improvements can be made to the appropriate metrics of that model. Such use cases can include common data science and machine learning tasks such as churn prediction and automated recommendation engines.
Dr. Sullivan introduces a "graph-theory" approach to machine learning, which can improves model results. When using graphs, it is possible to enrich data by including relationships (graph embeddings) between data points. These additional features can help to reduce model noise and boost accuracy.
Tutorial: Using BERT with PyTorch
Source: Collab
Author and founder Chris McCormick demonstrates the HuggingFace PyTorch library for fine-tuning natural language processing models
In this tutorial, we will use BERT to train a text classifier. Specifically, we will take the pre-trained BERT model, add an untrained layer of neurons on the end, and train the new model for our classification task.
BERT (Bidirectional Encoder Representations from Transformers) is a powerful transfer learning model used in natural language processing (NLP). BERT can be used to create word embeddings that help machines the nuances of words in sentences. It's been used extensively to create question answering systems, improve search accuracy, and help to fight online hate speech and bullying.
Chris McCormick showed how to use BERT for creating custom sentence classification applications by adding more layers to the pre-trained model. In deep learning, this process is called fine-tuning. Through the use of BERT and with only about 30 minutes of investment, Chris showed it is possible to get a significant improvement in a model.
Practical Machine Learning for Computer Vision
Source: GitHub
O'Reilly authors reveal best practices in Deep Learning Image Classification
Machine learning on images revolutionizes healthcare, manufacturing, retail, and many other sectors. Many previously difficult problems can now be solved by training machine learning models to identify objects in images.
Practical Machine Learning for Computer Vision authors Lakshmanan and Gillard presented a 3-hour workshop on implementing computer vision applications, describing best practices in the process. In the tutorial they showed how to train a transfer learning model, prepare the ML datasets, export a saved model instance, deploy the results, and fine tune the pipeline.
Amongst the many helpful things shared, the authors provided a number of tips specific to CNN models. Here were a few of the suggestions to maximize results:
- L1, L2 Regularization
- Batch normalization to scale inputs
- ReLU activation to resize weights
- Dropout to drop nodes that utilize single paths in the network
- Early stopping to prevent the model from overfitting
- Hyperparameter tuning that trains the network multiple times
Comments
Loading
No results found