This is Part I of a 2 part series. For Part II and to see the results of the categorisation, click here!
Datalytyx attended the October Azure AI Hackathon hosted by Microsoft in London. This was a team based technical event for Microsoft partners, focused on:
- Prototyping
- Coding
- Whiteboard design
- Development
Each partner turned up with a team; and either a project or business problem they wanted to solve. This event gave the teams three days to focus their time on their specific project/problem. Microsoft supported the efforts of the teams by providing excellent help and advice from their Azure and AI experts.
Companies turned up with a variety of projects from implementing basic Machine Learning on clean datasets using Microsoft Azure Machine Learning Studio, to building Bots using the Microsoft Bot Framework.
The aim of the event was to develop familiarity with both the Microsoft Azure Databricks and Snowflake environments and to get hands on with some data. We used Snowflake Enterprise Data Warehouse as it enabled us to perform queries in memory without copying the data; this Snowflake architecture gives us the speed and scalability we needed to produce analysis on the fly at an event such as the Hackathon. Azure Databricks notebooks is an environment that enables us to perform analytics which are best suited to fleshed out data science languages such as R and Python rather than standard SQL, such as clustering. One feature which we found particularly useful when performing exploratory analysis was the “One-click visualization” which can be accessed using SQL queries.
We came with the Seattle Public Library’s collection inventory dataset. This dataset contains a variety of information about items in the Seattle Public Library’s collection; however there is no genre column.
We worked to extract features from the dataset to categorise this collection into similar groupings, resembling genres. We took two approaches, both involving a column containing the item description as a text string. We implemented keyword extraction for words which are likely to correspond to a specific genre and a combination of tf-idf vectorisation and k-means clustering to group all of the data based on the similarity of the item descriptors in one process.
We found the help of the Microsoft experts invaluable and attending this Hackathon gave us exposure to a wealth of knowledge which we would not have had access to under normal circumstances. Microsoft held lightning talks during the period of the Hackathon for a variety of topics covering deep machine learning, ethical AI, Bots and a variety of other topics.
We’re all looking forward to the next one!
This is Part I of a 2 part series. For Part II and to see the results of the categorisation, click here!
0 Comments