Brandon Harper
Dec 14, 2021

Insights from Open Data Science Conference 2021 West

Insights, techniques and ideas shared by AI leaders and experts in San Francisco from November 15 to November 18, 2021.

I attended the November 2021 Open Data Science Conference (ODSC) West in San Francisco, California (held November 15th through November 18th). ODSC is an annual tech conference that gathers leading thinkers and experts in the fields of data science, artificial intelligence, and data operations. In this article, I'll highlight some of the concepts and ideas shared at this year's conference.

DataOps for the Modern Computer Vision Stack

Source: Slides

ODSC2021: DataOps for the Modern Computer Vision Stack - DataOps versus DevOps James Le. ODSC 2021. DataOps for the Modern Computer Vision Stack.

Data Advocate James Le introduces ML empowered DataOps workflow and the benefits in Computer Vision

In computer vision, DataOps means building a high-quality training dataset. This entails asking questions such as: What data? Where to find that data? How much data? Where to validate that data? What defines quality? Where to store that data? How to organize that data?

DataOps focuses on improving the workflows of data systems and analytic models used by data analysts and data engineers. Using machine learning to develop ideal practices for big data systems, organizations can make core workflows better, faster, and cheaper.

In his presentation, James Le described how DataOps practices fit perfectly within the Computer Vision context and the ways which systems can be composed into platforms to provide value for computer vision organization. James closed his presentation by describing the future of modern machine learning systems and introducing the Canonical Stack (CS) methodology as a blueprint to deliver better ML infrastructure in the future.

Machine Learning With Graphs: Going Beyond Tabular Data

Source: GitHub

Unsplash: Network Visualization Unsplash

Data Science Advocate Clair J. Sullivan introduces Graph Theory and how it can be applied to Machine Learning challenges

... by considering the relationships among those individual data points, models can be significantly enhanced and measurable improvements can be made to the appropriate metrics of that model. Such use cases can include common data science and machine learning tasks such as churn prediction and automated recommendation engines.

Dr. Sullivan introduces a "graph-theory" approach to machine learning, which can improves model results. When using graphs, it is possible to enrich data by including relationships (graph embeddings) between data points. These additional features can help to reduce model noise and boost accuracy.

Tutorial: Using BERT with PyTorch

Source: Collab

ODSC2021: BERT Tutorial. Overview of neural network and training process.
BERT is a computational model that converts words into numbers. It creates word embeddings that can be fine tuned for many different use-cases.
Devlin et al. BERT: Pre-training of Deep Bidirectional Tranbsformers for Language Understanding. Google AI Language. 2019.

Author and founder Chris McCormick demonstrates the HuggingFace PyTorch library for fine-tuning natural language processing models

In this tutorial, we will use BERT to train a text classifier. Specifically, we will take the pre-trained BERT model, add an untrained layer of neurons on the end, and train the new model for our classification task.

BERT (Bidirectional Encoder Representations from Transformers) is a powerful transfer learning model used in natural language processing (NLP). BERT can be used to create word embeddings that help machines the nuances of words in sentences. It's been used extensively to create question answering systems, improve search accuracy, and help to fight online hate speech and bullying.

Chris McCormick showed how to use BERT for creating custom sentence classification applications by adding more layers to the pre-trained model. In deep learning, this process is called fine-tuning. Through the use of BERT and with only about 30 minutes of investment, Chris showed it is possible to get a significant improvement in a model.

Practical Machine Learning for Computer Vision

Source: GitHub

O'Reilly: Practical Machine Learning for Computer Vision O'Reilly Media

O'Reilly authors reveal best practices in Deep Learning Image Classification

Machine learning on images revolutionizes healthcare, manufacturing, retail, and many other sectors. Many previously difficult problems can now be solved by training machine learning models to identify objects in images.

Practical Machine Learning for Computer Vision authors Lakshmanan and Gillard presented a 3-hour workshop on implementing computer vision applications, describing best practices in the process. In the tutorial they showed how to train a transfer learning model, prepare the ML datasets, export a saved model instance, deploy the results, and fine tune the pipeline.

Amongst the many helpful things shared, the authors provided a number of tips specific to CNN models. Here were a few of the suggestions to maximize results:

  • L1, L2 Regularization
  • Batch normalization to scale inputs
  • ReLU activation to resize weights
  • Dropout to drop nodes that utilize single paths in the network
  • Early stopping to prevent the model from overfitting
  • Hyperparameter tuning that trains the network multiple times
Brandon Harper Dec 14, 2021
More Articles by Brandon Harper

Loading

Unable to find related content

Comments

Loading
Unable to retrieve data due to an error
Retry
No results found
Back to All Comments