About

Emerging data scientist with expertise in applied machine learning, research methods, and data visualization. Contact me for opportunities via email or LinkedIn.

Basic Information
Email:
dustin.luchmee@protonmail.com
Location:
Philadelphia, PA
Language:
English
Citizenship:
United States
Professional Skills

  • Python
  • R
  • SQL
  • Data Analysis
  • Data Wrangling
  • Data Cleaning
  • Exploratory Data Analysis
  • Feature Engineering
  • Data Visualization
  • Data Storytelling
  • Statistics
  • Experimental Design
  • Regression Analysis
  • Hypothesis Testing
  • Machine Learning
  • Natural Language Processing (NLP)
  • Pandas
  • Scikit-Learn
  • TensorFlow
  • PyTorch
  • AWS Data Analytics (In Progress)
  • AWS Machine Learning (In Progress)
  • Technical Writing
  • Presentation Skills
  • Project Management
  • Research
PROJECTS
`

Project Title: Literature Skimming with TensorFlow

Techniques Used:

  • TensorFlow
  • spaCy
  • Scikit-Learn
  • Deep Sequence Modeling
  • Model evaluation and Comparison
  • MatPlotLib

Summary: This project demonstrates the use of TensorFlow to build a deep learning model for scientific paper summarization. Articles from PubMed are scraped and preprocessed to create a dataset. Next, five models are developed and compared using accuracy, precision, and recall to examine performance. The best performing model was then tested on new data coming from PubMed.

Project Title: Food Vision with TensorFlow

Techniques Used:

  • TensorFlow datasets
  • Creation of a Preprocessing Function
  • Batching and Preparing Data for Modeling
  • Creating Modeling Callbacks
  • Mixed-Precision Training
  • Feature Extraction Models
  • Fine Tuning
  • Viewing Training Results on TensorBoard

Summary: This project involves building a deep learning model using TensorFlow to classify food images. It employs transfer learning, utilizing the EfficientNet model pre-trained on the ImageNet dataset. Data augmentation techniques such as rotation, zooming, and flipping are applied to enhance model generalization. The model is fine-tuned by training specific layers for food classification. Evaluation metrics including accuracy, precision, recall, and F1 score are utilized to assess model performance. The project notebook provides a comprehensive guide, explaining the implementation steps and providing insights into model evaluation and prediction visualization.

Project Title: 2019 Product Sales

Techniques Used:

  • NumPy
  • Pandas
  • SeaBorn
  • MatPlotLib
  • Probability Calculation

Summary: Sales data was downloaded from Kaggle. Exploratory data analysis, data visualization, and probability calculations were performed to examine the best month of sales, when customers are buying products, and how likely customers are to purchase a product in the future.

Project Title: Big 5 Personality Inventory Analysis

Techniques Used:

  • Exploratory data analysis
  • Data visualization
  • NumPy
  • Cosine Similarity
  • SeaBorn

Summary: The Big 5 Personality Inventory is an assessment used to measure a person’s personality using questions that examine the 5 broad dimensions of personality. This assessment can be used to provide someone with a sense of self-awareness, help them find roles or workplace environments that they would enjoy, and even for dating!
While popular, the Big 5 Personality inventory does have limitations. For one, the assessment is limited, with many critics concerned about the absence of a comprehensive theory. Second, individuals may answer questions in a way that they deem socially acceptable rather than true to their own nature. Lastly, personality changes over time as individuals mature or face new situations in life. Thus, the results of this test are not stable. Nonetheless, this was a fun project to work on to connect data science with psychology!

Project Title: Spotify Playlist Classification

Techniques Used:

Classification project that uses data scraped from Spotify using Spotify’s API. Techniques employed:

  • Exploratory data analysis
  • NumPy
  • Pandas
  • SeaBorn
  • Scikit-Learn
  • Logistic regression
  • SVM

Summary: Playlist classification using variables of artist, album, danceability, energy, key, loudness, mode, speechiness, instrumentalness, liveness, valence, & tempo. Analytical techniques compared logistic regression vs. variations of SVM. Data was scraped using Spotify's API, playlists selected were those I recently listened to.

Project Title: Restaurant Recommendation

Techniques Used:

Restaurant recommendation project that uses data from restaurant reviews to recommend similar restaurants Techniques employed:

  • Exploratory data analysis
  • Text cleaning
  • TF-IDF
  • Cosine Similarity
  • NLTK
  • MatPlotLib

Summary: Each day, millions of people look for restaurants to try. In this project, I developed a restaurant recommendation system utilizing customer reviews of restaurants using TF-IDF for the content based recommender system that was deployed.

Project Title: Sentiment analysis of Amazon product reviews

Techniques Used:

  • Exploratory data analysis
  • Text cleaning
  • spaCy
  • Scikit-learn
  • WordCloud
  • Sentiment analysis

Summary: Each day, millions of people leave reviews on Amazon regarding their experience with different products. Sentiment analysis regarding specific product types can provide insight as to what customer preferences exist as well as frustrations. Amazon product reviews for electronic products were explored, cleaned, and analyzed in order to determine which products were most successful and which ones were least liked by customers.

Project Title: Using Twitter tweets and news headlines to predict stock market day return

Techniques Used:

  • Exploratory data analysis
  • Text cleaning
  • Linear regression
  • Scikit-learn
  • WordCloud
  • Decision tree regression

Summary: Stock market prediction is a noteworthy task as investors are eager to earn additional income. The stock market is subject to volatility, especially from cultural influences such as when Manuel Locatelli shoved a Coke bottle away and demanded water. Using linear regression and decision tree regression, I aimed to see whether it is possible to predict day return using tweets and news headlines.

Project Title: Extracting tweet-like summaries from the news

Techniques Used:

  • Continuous Bag of Words (CBOW)
  • Scikit-learn
  • Text and data mining (TDM)
  • Inverse document frequency (IDF)
  • Logistic regression
  • ROUGE-L
  • Matplotlib

Summary: News stories can be further condensed into ‘tweet’ like summaries. By operating a statistical engine, isolating the inverse document frequencies, creating a logistic classifier, and evaluating the precision, accuracy, and recall metrics of our summarizations. Box-and-whisker plotting was done to visualize these metrics.

Project Title: N-Gram Language Modeling with CBOW statistics

Techniques Used:

  • Continuous Bag of Words (CBOW)
  • N-gram counting
  • Language modeling
  • Scikit-learn
  • Recitation function
  • Perplexity performance evaluation
  • Rambling functions

Summary: This project was performed in order to learn language modeling techniques. By operating a statistical engine, building n-gram frequencies, building a language model with a model sampler, creating a recitation and rambling function, and a perplexity performance evaluation function, I was able to summarize a news story about an upcoming Robert Downey Jr. film.

Project Title: Conversational disentanglement

Techniques Used:

  • Tokenization
  • PyTorch
  • Loss function
  • Train function
  • Evaluation function
  • Position embedding
  • Time embedding

Summary: Conversational disentanglement allows for individuals to know which speakers express which sentiments or attitudes. This project required the use of tokenization, constructing a PyTorch dataset, constructing a network architecture, constructing a loss function, building an optimizer, creating an evaluation function, implementing position embedding, and a time embedding. In addition to this, a linear layer was used to evaluate the success of the network.

Project Title: Abstractive Summarization of Scientific Papers with BART

Techniques Used:

  • BART
  • Hugging Face
  • Text summarization
  • ROUGE metrics

Summary: Scientific texts are summarized in an abstract. In this project, my team and I fine-tuned a pre-trained BART model to perform an abstractive summarization of a scientific document in order to generate an abstract. In addition, this allowed us to gain further experience and understanding of using Hugging Face.

Project Title: New York City Airbnb Open Data via Kaggle

Techniques Used:

  • Exploratory data analysis
  • Pipeline building
  • Linear Regression
  • Decision Tree Regression

Summary: Airbnb’s within one area are priced differently for various reasons. In this assignment, I first used exploratory data analysis using Matplotlib to explore numerical data variables. Next, I employed scikit-learn to split data into training and test sets. After this, I then built a pipeline to produce usable training and testing data. Lastly, I performed linear regression and decision tree regression root mean square errors to see which model best predicted Airbnb pricing based on the available variables.

Project Title: Income Classification

Techniques Used:

  • Exploratory data analysis
  • Seaborn
  • Logistic regression
  • Scikit-learn
  • Support vector machine model
  • Poly support vector machine model
  • Naïve bayes mode
  • K-nearest neighbor
  • Hyperparameter tuning

Summary: Income determination of individuals has been an interest in the context of addressing social problems, such as poverty. In this assignment, classification techniques were tested to determine which model was best able to determine if an individual made over $50,000 USD per year. Exploratory data analysis was performed using Seaborn. Test and training data were separated using scikit-learn. Next, logistic regression, SVM, pSVM, naïve bayes, and KNN models were tested. It was determined that SVM was the best model to use in order to classify which individuals made over $50K.

Project Title: Unsupervised machine learning techniques for vehicle pricing

Techniques Used:

  • Exploratory data analysis
  • Seaborn
  • Linear regression
  • Logistic regression
  • Decision tree regression
  • Random forest regression
  • Scikit-learn

Summary: Vehicles of all brands and conditions vary in price across the world. In this project, regression techniques were employed to determine which model best predicted vehicle price. Exploratory data analysis was performed using Seaborn. Test and training data were separated using scikit-learn. Next, linear regression, logistic regression, decision tree regression, and random forest regression were tested. In this exercise, random forest regression was found to be the best predictor of vehicle pricing.

Project Title: Predicting heart disease

Techniques Used:

  • Exploratory data analysis
  • Seaborn
  • Logistic regression
  • Scikit-learn
  • Support vector machine model
  • Naïve bayes model

Summary: Across the United States, heart disease is a top killer of American adults. When people are admitted to the hospital, numerous data points are collected. In this assignment, we used common health data points in order to find which model most accurately predicted whether someone would develop heart disease. Models chosen for comparison included logistic regression, SVM, and naïve bayes modeling. It was found that logistic regression was the best model to use to predict whether someone is likely to develop heart disease.

Professional Exprience

2019 - 2022

HappyNeuron Inc

Philadelphia, PA

Product Owner
  • Spearheaded the creation of written and video content to attract prospective clients and communicate company solutions; present company solutions at conferences.
  • Forged and foster strong client relationships to drive sales of company solutions and promote satisfaction and retention.
  • Facilitated training sessions to coach clients in product knowledge and offer technical support; explain complex technical concepts in easy-to-understand terms.
  • Partnered with the consulting branch of Humans Matter to boost business development by responding to proposal requests.

2016 - 2019

Moss Rehabilitation Research Institute

Elkins Park, PA

Research Assistant, Neuroplasticity and Motor Behavior Laboratory
  • Assisted in protocol design and implementation of studies as well as managed timelines for recruitment of participants, collection, and analysis of data.
  • Trained junior laboratory members in data collection, analysis, procedures, use of technology, and provided orientation to the institute and laboratory.
  • Did preliminary analysis using Excel/SPSS to describe findings in data.
  • Worked closely with the IRB coordinator to ensure proper conduction of research and make modifications as needed to ensure adherence to protocols.
  • Conducted literature searches as needed to assist in review article and protocol design.
  • Presented and designed posters for conferences as well as figures for posters and papers.
  • Provided feedback and assistance with implementation of the new patient registry.
  • Assisted and coordinated with scientists, patients, and fundraising teams for donor presentations.

2014 - 2015

University of Pennsylvania
Center for Cognitive Neuroscience

Philadelphia, PA

Research Intern
  • Managed the recruitment of participants for studies and the scheduling and conduction of neuroimaging sessions.
  • Conducted and assisted senior lab members with neurostimulation sessions and behavioral testing.
  • Shared and explained preliminary findings of studies with laboratory staff.
  • Managed diverse research studies including one study on executive functioning and the neural networks responsible for it, one study on smoking cessation and messaging, and one study on weight management and exercise.
  • Managed the recruitment of participants for studies and the scheduling and conduction of neuroimaging sessions.
  • Conducted and assisted senior lab members with neurostimulation sessions and behavioral testing.
  • Shared and explained preliminary findings of studies with laboratory staff.
  • Managed diverse research studies including one study on executive functioning and the neural networks responsible for it, one study on smoking cessation and messaging, and one study on weight management and exercise.
EDUCATION

Dec 2021

Master's
Master of Science, Data Science

Drexel University | Philadelphia, PA

May 2013

Bachelor's
Bachelor of Arts, Psychology and Cognitive Science

University of Richmond | Richmond, VA

ACHIEVEMENTS

Research Presentations:

  • Johnson, T., Ridgeway, G., Luchmee, D., & Kantak, S. (2021). Bimanual coordination during reach-to-grasp actions is sensitive to task goal with distinctions between left- and right- hemispheric stroke. Submitted.
  • Kantak, S., & Luchmee, D. (2020). Contralesional motor cortex is causally engaged during more dexterous actions of the paretic hand after stroke-A Preliminary report. Neuroscience Letters, 134751.
  • Kantak, S., McGrath, R., Zahedi, N. & Luchmee, D. (2017). Behavioral and neurophysiological mechanisms underlying motor skill learning in patients with post - stroke hemiparesis. Clinical Neurophysiology, 129(1), 1 - 12.
  • Research Presentations:

    • Kantak, S. & Luchmee, D. (2017, November). Dexterity requirements modulate ipsilateral motor corticospinal excitability in post-stroke individuals. Poster session presented at the annual Society for Neuroscience conference in Washington, D.C.