GitHub Copilot solves LeetCode problems. How?
A pair-programmer you should not use.
We all know Github Copilot is here and it is amazing. Although the complete potential of GitHub Copilot is far beyond the context of this article, we will concentrate on one of GitHub copilot's capabilities, which is to solve competitive programming problems.
How does Copilot work?
Before anything, Let me explain the way GitHub copilot works. We provide some comments explaining what we are trying to do and the copilot gives potential code snippets based on the requirements.
Does it work? I have tried Copilot on one of my live projects and I'm working on another article regarding my experience with my AI pair programmer.
So, this is the right time to hit that follow button on my blog to make sure that you check that out. Otherwise, an easier way can be to follow me on Twitter and I will remind you there.
Time for some Leetcode problems
Let's start with an easy problem first. Two Sum is the problem of choice. Now, it's time for Vscode to get in the action.
GitHub copilot is installed on my vscode and it's active.
Comments
We already saw that we need some comments to inform our AI partner, what we need. As we are solving a coding problem here, the question description of the problem will do the job fine.
# Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to the target.
# You may assume that each input would have exactly one solution, and you may not use the same element twice.
I used the comments above to get started.
Function Signature
As recommended by the Copilot, it works the best under circumstances where we have a function that solves a specific problem.
Since the problem we are having here is small, the start-up code given by leetcode can be the function signature we need, to give the copilot some idea.
class Solution:
def twoSum(self, nums: List[int], target: int) -> List[int]:
Once we have the function signature in place, just wait...
BOOM!!
The most efficient solution is directly here. Pressing the tab will just get that code on your file.
Just one solution? There is more!
You can open the copilot's dedicated tab by pressing Ctrl + enter
. There will be 10 potential code snippets from which you can choose what you want. Some of which are very similar. Coz, we don't have 10 different solutions to this problem.
The code similarity can range from similar time complexity and logic with different variable names to exactly the same code.
But, I was satisfied with the variety of solutions it was able to come up with. Here is a couple of them.
1. Brute force O(n^2)
for i in range(len(nums)):
for j in range(i+1,len(nums)):
if nums[i] + nums[j] == target:
return [i,j]
2. Dictionary-based algorithm with better naming conventions
nums_hash = {}
for i, num in enumerate(nums):
if target - num in nums_hash:
return [nums_hash[target - num], i]
nums_hash[num] = i
There are some issues to address. Firstly, the naming conventions are bad π . Let's get there.
Let's understand Copilot
Before getting wowed by the ability of AI as our coding partner, it's our responsibility to actually understand how it works. An oversimplified definition of Machine learning would go like
"An algorithm that solves a problem, but with enormous data-sets along with the ability to identify similarities."
I am not going to bore you by saying those jargon words like OpenAI, GPT-3 and CodeX.
Instead, let's concentrate on what data-set the model is trained from. GitHub claims that it used source code from publicly available sources, including code in the public repositories of GitHub.
Note: Training machine learning models on publicly available data is considered fair use across the machine learning community. - GitHub
The publicly available sources also include StackOverflow. So, does that mean the Copilot is just StackOverflow on steroids? Probably.
How do we get code snippets that can be directly executed?
We get code snippets that can be directly executed with one click for a leetcode problem through AI. But how?
The simple understandable reason for that is that many programmers and competitive programmers have the habit of documenting their coding solutions on GitHub by uploading the solutions of the leetcode problems on their GitHub repository which include myself. π
We should also remember that AI will not be able to generate the required code snippets all the time. Especially new programmers need to be careful if you are fiddling around copilot in your projects because fitting the code snippets in the gaps without raising any issue is a skill by itself.
If you can fit a StackOverflow snippet seamlessly in your codebase, you will probably enjoy your AI-driven coding partner. Otherwise, this is a No-No.
Conclusion - Dataset is the key π
It all boils down to the fact that the data set that was used to train the model has the exact code snippet you need to solve the problem. This also addresses the reason for bad naming conventions as most Competitive Programmers don't care about good naming. n for number, s for string, d for dictionary
is what most people use π.
I wouldn't recommend anyone to use copilot to solve leetcode problems as it defeats the purpose of learning and improving your coding ability. Just take your time, learn and do it.
Coming next...
I have gone the extra mile to actually use copilot on a production application. My experience on that project is coming soon.
Make sure to subscribe to the newsletter and follow me on Twitter to keep track of this and for more updates.
Check out my profile on Github! I have curated a list of the learning resources that has worked for me so far and you can pick one for yourself from there. Feel free to starβ the repository to show your support.
Thanks for spending your time here. I hope that you got away some value.
Comment your thoughts on the ability of the GitHub Copilot and AI in general.