EDGE Project

AI Recognition: Photo and Voice Recognition

Southern Utah University Junior/Senior EDGE Project (2019)


  • Created: 2019-06-02
  • Update: 2022-10-04
  • License: MIT

Junior/Senior Year EDGE Project (SUU 2019): Photo and Voice Recognition software intended to be a very basic Artificial Intelligence with basic Machine Learning.

The software is a Visual Studio Project written in C#. It utilizes a microphone and camera in order to learn people's faces and names. Once a person is added it will attempt to remember that individual the next time it sees them.

This software is currently only designed for Windows systems, specifically Windows 10 with .NET.


Getting Started

Installation

Clone or fork tybayn/edge-project to your local machine. Extract the files if necessary.


Prerequisites

Compiler

This software was built in the Microsoft Visual Studio Community 2019 IDE, and is intended to be compiled in the same IDE. It will likely work with newer versions of the same software.

Opening "tybaynEDGEproject.sln" will open the entire solution.

External Libraries

Many parts of this software are linked to or dependent on other libraries and source code. Any source code that is not compatible with the MIT license directly has been excluded from this repository and must be downloaded/included separately.

Any and all external libraries are owned and copyrighted by their respective authors and licensed under their respective licenses and ARE NOT included under the MIT license of this repository. They are used in an unaltered state and simply linked to.

Excluded NuGet Packages

The following libraries can simply be added using Visual Studio's NuGet Package manager:

  • Accord.NET (v3.8.0) -by Accord.NET
  • Accord.Vision (v3.8.0) -by Accord.NET
  • NAudio (v1.10.0) -by Mark Heath

Excluded Third Party Libraries

The following are not available on NuGet and the .dll should be downloaded and linked within Visual Studio:

Included Third Party Libraries

The "WebCam_Capture.dll" library is licensed under MIT and is included. It is in "tybaynEDGEproject/lib/WebCam_Capture.dll" and just needs to be linked within Visual Studio.

NOTICE

PLEASE READ THE "NOTICE" DOCUMENT FOR MORE INFORMATION ON THE EXTERNAL LIBRARIES REFERENCED AND THEIR LICENSING.

File Structure

Ensure that the following file structure is maintained as the project is downloaded:


tybaynEDGEproject.sln
tybaynEDGEproject/
|  App.config
|  AudioCompareHandler.cs
|  AudioHandler.cs
|  FileHandler.cs
|  Form1.cs
|  Form1.Designer.cs
|  Form1.resx
|  Helper.cs
|  PitchShift.cs
|  Program.cs
|  tybaynEDGEPproject.csproj
|  VideoCompareHandler.cs
|  VideoHandler.cs
|  WaveFormVisualizer.cs
|  WebCam.cs
|
└──bin/
|  └──data/
|  |  └──resources/
|  |  |  |  faceOverlay.png
|  |  |
|  |  ...
|  ...
|
└──lib/
|  |  WebCam_Capture.dll
|
...
          

Using the Software

This project is intended to be run with Visual Studio. To run the software, open the 'tybaynEDGEproject.sln' in Visual Studio and run the project from there.


Agreement

By using this software you confirm that you have read this document, the NOTICES file pertaining to external and excluded libraries, and the LICENSE document and agree to all the terms within.


About the Project

This next section is the documents and research done regarding the project required by the University (literally, all of it). So unless you enjoy walls of text and what essentially amounts to journal entries, then you are good to leave at this point!

Project Description

For this project I will design, create, and code an Artificial Intelligence (AI) written in C# (.NET framework). This artificial intelligence will be able to record a person's face and voice and remember them. Then at a later time be able to recall that data and be able to compare it to a live feed of a person's face and voice and compare the two sources to see if they match.

This project encompasses ideas of self learning and the ability of a computer to recognize valid input apart from invalid input. To see the full project description, all the steps along the way, the final product, and my reflections about the project, see the sections below.

Abstract

For my EDGE project I will design, create, and code an Artificial Intelligence (AI) written in C#. This artificial intelligence will be able to record a person's face and voice and remember them. Then at a later time be able to recall that data and be able to compare it to a live feed of a person's face and voice and compare the two sources to see if they match. This AI will allow for the personal discovery of how data is represented and stored in a computer, how to compare audio and video as it is represented in a computer, and how AI’s can be used in different scenarios and for different purposes. I plan to build this AI to prove to myself that I have the ability to learn and push myself further than what is required and be able to use this as a standout point on my resume. This will also allow me to have an experience in programming something that is greatly needed in the work world, but not largely taught in universities.

Larger Purpose

I have chosen to build this project to give me experience in writing, coding, and creating an Artificial Intelligence. It will allow me to gain an understanding of how computers view live images and digital conversions of analog audio and how to compare them.

Project Goal

For my EDGE project I will create an Interactive Artificial Intelligence that will correctly identify and recognize faces and voice (video and audio). The AI will compare live audio feed from a microphone and live video feed from a webcam and compare it to stored audio and video files. The AI will be able to learn new people and recognize them later.

Outcomes

  • If I create an AI that can prompt for and record new people, then I will have a better understanding of machine learning and how computers “remember” data and how that data is represented in a computer.
  • If I create an AI, then I begin to understand the usefulness and applications of different kinds of AI’s.

Deliverables

I will deliver a zip file that contains the actual AI software executable files alongside all the needed dependency files and “memory” files. I will also provide the paperwork with the design process and any subprograms used for testing functionality of parts of the software.

Differentiation

Artificial Intelligence is not a subject that is taught at SUU or many other universities. But the ideas of each of those topics are needed greatly. By creating an AI I will be able to have a deeper understanding of a topic that is largely expanding and companies of all types are starting to look toward to optimize and accelerate processes. With this project I will be able to include AI creation on my resume and have a starting ground when I need to create other AI’s in the future.

Estimated Timeline

  • 05/06/2019 - Begin Project
  • 05/12/2019 - Complete layout of software and flowcharting
  • 05/19/2019 - Finish UI/UX and GUI Design
  • 06/02/2019 - Complete Image and Audio recording software
  • 06/16/2019 - Complete Complete Image and Audio comparison software
  • 06/19/2019 - Complete Text to Voice software
  • 06/25/2019 - Finish Driver Program
  • 06/27/2019 - Debugging and Testing
  • 06/28/2019 - Complete Project

This section simply depicts the expected timeline and design requirements. The actual timeline and design documents are outlined below.


Design Docs

Flowchart:

Pseudocode:

Audio Comparator


Compare Audio{
  -Will receive two samples and will use .NET
  library to compare the wavelengths
  -Returns a double from 0 - 1 of the percentage
  value of how similar they are
}
          

Audio Handler


​Initialize Microphone{
   -Call existing library to start live feed
   -Put data into Global Variable
  }
  Adjust {
   -Receives volume to adjust the microphone object
  }
  Display Thread{
   -While(true){
    -Get audio from microphone
    -Display waveform to gui display
    -Refresh gui
   }
  }
  Get sample{
   -Get audio from microphone
   -Store in temporary memory
   -Return audio (or audio address)
  }
  Start New User{
   -Display sentence to screen
   -Get audio from microphone
   -Return audio (or audio address)
}
          

Data Handler


Constructor{
   -Create Video Comparator
   -Create Audio Comparator
}
Store Image{
   -Receives image
   -Stores image data (maybe with tree?)
   -Associate image with name
}
Store audio{
   -Receives sample
   -Stores sample data
   -Associate sample with name
}
Compare Video{
   -Passes image to Video Comparator
   -Returns true or false
}
Compare Audio{
   -Passes sample to audio Comparator
   -Returns true or false
}
          

Driver


Main{
   -Create Data Handler
   -Create Audio Handler
   -Create Video Handler
   -Start running Listening Thread
}
Listen Thread{
 -While(true){
  -Get image from webcam
  -If( image contains a new form)
   -Prompt User to speak sentence or say name
   -Call Video Comparator to see if image matches
   -If(image is not found)
    -Goto Add new user
   -Else
    -Call Audio Comparator for audio matches
    -If( audio is not found)
     -Goto Add new user
    -Else
     -Get user name
     -Prompt user if they are this person
     -If(not the user)
      -Goto Add new User
 -}
}
Add new User{
 -Prompt user if they want to be added
 -If( they want to be added)
  -Prompt User for Name
  -Have user line up on camera, take photo
  -Have user say their name/sentence
  -Call Data Storage to store name, image, and audio
 -return
}
Speech to text{
 -Receives a string passes it to the speech to text handler
}   
          

Speech to Text


Speak{
 -Receives a string and uses the .NET library to
 -convert it to an audio clip
}
          

Video Comparator


Compare Video{
 -Will receive two photos and will use the .NET video
 compare library to compare them
 -Returns a double from 0 – 1 of the percentage value of
 how similar they are.
}
          

Video Handler


Initialize Camera{
 -Call existing library to start live feed
 -Put data into Global Variable
}
Adjust {
 -Reinitialize the camera to adjust for light and focus
}
Display Thread{
 -While(true){
  -Get image from camera
  -Display image to gui display
  -Refresh gui
 }
}
Get image{
 -Get image from camera
 -Store in temporary memory
 -Return image (or image address)
}
Start New User{
 -Place face template over camera image
 -Get image from camera
 -Return image
}
          

GUI Design:

GUI Design 1 (Wireframe)



GUI Design 2 (Wireframe)


Timeline and Updates

05/06/2019

  • Created flowchart Diagram

05/08/2019

  • ​Wrote Class Pseudocode and Started Gui Sketches

05/13/2019

  • ​Started Building Gui, researched Webcam and waveform libraries

05/14/2019

  • GUI Created, start implementing Graphical representations of live audio and live video, both are now in the gui.

05/15/2019

  • Modularized code, took audio code and create the audio handler class, took the video code and created the video handler class. Ensured that parts were modular and loosely coupled. Started research on face detection.

05/16/2019

  • Working on Voice to speech. Ran into a few unexpected issues with being able to have the mic muted but speech not. Compatibility issues with text to speech, scrapping feature. Attempting to fix threading issue with audio capture.

05/20/2019

  • Fixed threading issue with both video and audio capture. Added new buttons to the front panel. Started working on and finished the data control class. Started working on the video comparison class and finding a library to tell me if a face in two separate photos are the same person.

05/22/2019

  • Added a capture button and a face template to the GUI. Also changed the boundaries of face recognition, faces must be within the boundary in order for the recognition to start the compare procedure. This resulted in needing a mirror instead of natural feed. Face comparison class is nearly complete, will change as we finish testing. Started on audio comparison.

05/23/2019

  • Started working on the ability to add new people and started to finalize the process for comparing people. Still researching FFT from wav and audio comparisons. Audio recordings are already implemented, but they are not being compared while testing.

05/24/2019

  • I started working on the audio comparisons. The FFT comparisons are still largely being developed and I don't want to put too much time into this project or do a PhD yet. So I will be using a different method to test audio. I am going to look into converting the wav files to strings using speech-to-text, and then comparing the strings

05/26/2019

  • I got the audio comparisons working, and small testing is showing that it is working. I got that part imported into the main program, the driver is also complete now. The comparison function now utilizes both the video and audio comparison functions. Commenting and inline documentation is done.

05/27/2019

  • I started doing testing with more people, the face recognition works surprisingly well, but the voice does not. It seems to have too much background static for the Speech-to-Text to work.

05/29/2019

  • Doing some research on how to fix the Speech-to-Text issue, will try changing the recording data from 8-bit to 16-bit and from stereo to mono.

06/02/2019

  • Doing some final testing. Checking if the new settings to the microphone will be helping. - The settings do help, but we needed to create new audio files where the pitch is shifted down, that way people with higher voices can be recognized as well. After this change, the testing went smoothly. The AI was able to correctly identify people about 90% of the time. Given there are only 2 of us that are testing.

06/02/2019

  • Testing is complete. Got files ready for Turn In. Made video to showcase project.

Final Product

The actual code and software project are located on my GitHub account. tybayn/edge-project


Video Demo


Reflection

This was one of my first big projects that I did by myself with no guidance or requirements. It allowed my to branch out and to figure out what I know, what I was able to learn, and what I still need to work on. In the end, I am proud of the results and the software created.

I have found that this maybe truly isn't an AI, but more a machine learning in its most basic definition. I still have a lot to learn, but I found this project to be enjoyable.


Version 1.0 (2 June, 2019)

Initial Release