Python3 Practice Program
Remove the leading spaces from the string input_str = ' This is my first code '
input_str = ' This is my first code'
final_str = input_str.strip()
print(final_str)
String Split
Description Split the string input_str = 'Kumar_Ravi_003' to the person's second name, first name and unique customer code. In this example, second_name= 'Kumar', first_name= 'Ravi', customer_code = '003'.
A sample output of the input 'Kumar_Ravi_003' is: Ravi Kumar 003
Note that you need to print in the order first name, last name and customer code.
# First Method
input_str ='Kumar_Ravi_003'
first_name = input_str[6:10]
second_name = input_str[0:5]#write your answer here
customer_code = input_str[11:14]#write your answer here
print(first_name)
print(second_name)
print(customer_code)
# Second Method
input_str ='Kumar_Ravi_003'
name = input_str.split('_')
first_name=name[0]
second_name = name[1]
customer_code = name[2]
print(second_name +" "+ first_name+" "+customer_code)
input_str = 'Kumar_Ravi_003'
#first we split the string using '_'
#first name will be the 2nd element (index 1) second name will be first element(index 0)
n_list=input_str.split("_")
first_name = n_list[1]#write your answer here
second_name = n_list[0]#write your answer here
customer_code = n_list[2]#write your answer here
print(first_name)
print(second_name)
print(customer_code)
List_remove_append
Description Remove SPSS from input_list=['SAS', 'R', 'PYTHON', 'SPSS'] and add 'SPARK' in its place.
input_list = ['SAS', 'R', 'PYTHON', 'SPSS']
# Write code to remove 'SPSS'
input_list.remove('SPSS')
# Write code to append 'SPARK'
input_list.append("SPARK")
print(input_list)
string to list conversion
Description Convert a string input_str = 'I love Data Science & Python' to a list by splitting it on ‘&’. The sample output for this string will be: ['I love Data Science ', ' Python']
input_str = 'I love Data Science & Python'
#we will simply split the string at &
output_list = input_str.split('&')#Type your answer here
print(output_list)
List to String
Description Convert a list ['Pythons syntax is easy to learn', 'Pythons syntax is very clear'] to a string using ‘&’. The sample output of this string will be: Pythons syntax is easy to learn & Pythons syntax is very clear
Note that there is a space on both sides of '&' (as usual in English sentences).
input_str = ['Pythons syntax is easy to learn', 'Pythons syntax is very clear']
string_1 =" & ".join(input_str) #Type your answer here
print(string_1)
Nested List
Description Extract Python from a nested list input_list = [['SAS','R'],['Tableau','SQL'],['Python','Java']]
input_list = [['SAS','R'],['Tableau','SQL'],['Python','Java']]
print(input_list[2][0])
Tuple
Description Add the element ‘Python’ to a tuple input_tuple = ('Monty Python', 'British', 1969). Since tuples are immutable, one way to do this is to convert the tuple to a list, add the element, and convert it back to a tuple.
To learn how to convert a list to a tuple, search for it on Google / Stack Overflow etc.
input_tuple = ('Monty Python', 'British', 1969)
list1=list(input_tuple)
list1.append('Python')
print(tuple(list1))
Dict_Error
Description From a Dictionary input_dict={'Name': 'Monty', 'Profession': 'Singer' }, get the value of a key ‘Label’ which is not a part of the dictionary, in such a way that Python doesn't hit an error. If the key does not exist in the dictionary, Python should return 'NA'.
# Method 1
input_dict={'Name': 'Monty', 'Profession': 'Singer' }
input_dict.get('Label','NA')
# Method 2
input_dict={'Name': 'Monty', 'Profession': 'Singer' }
if('Label' in input_dict.keys()):
answer = input_dict['Label']
else:
answer='NA'
print(answer)
Getting a Value from a Dictionary.
Description Extract the company headed by Tim Cook from the dictionary {'Jack Dorsey': 'Twitter', 'Tim Cook': 'Apple','Jeff Bezos': 'Amazon','Mukesh Ambani': 'RJIO'}
input_dict={'Jack Dorsey': 'Twitter', 'Tim Cook': 'Apple','Jeff Bezos': 'Amazon','Mukesh Ambani': 'RJIO'}
name = input_dict['Tim Cook']
print(name)
List of Values in a Dictionary.
Description Create a SORTED list of all values from the dictionary input_dict = {'Jack Dorsey' : 'Twitter' , 'Tim Cook' : 'Apple','Jeff Bezos' : 'Amazon' ,'Mukesh Ambani' : 'RJIO'}
input_dict = {'Jack Dorsey' : 'Twitter' , 'Tim Cook' : 'Apple','Jeff Bezos' : 'Amazon' ,'Mukesh Ambani' : 'RJIO'}
value_list = input_dict.values()
print(sorted(value_list))
What will the output of the following set of instructions be?
d = {'Python':40, 'R':45} print(list(d.keys()))
d = {'Python':40, 'R':45}
print(list(d.keys()))
O/p : ['Python', 'R']
Set_diff
Description Find the difference, using difference and symmetric_difference, between two given lists - list1 and list2.
First, convert the lists into sets and store them as set_1 and set_2. Then store the difference and symmetric difference in answer_1 and answer_2 respectively. Print both the answers as sorted lists, i.e. convert the final sets to lists, sort it and then return it.
list_1 = [1,2,3,4,5,6]
list_2 = [2,3,4,5,6,7,8,9]
set_1 = set(list_1)
set_2 = set(list_2)
answer_1 = sorted(list(set_1.difference(set_2)))
answer_2 = sorted(list(set_1.symmetric_difference(set_2)))
print(answer_1)
print(answer_2)
o/p
[1]
[1, 7, 8, 9]
If-Else
Description Write a code to check if the string in input_str starts with a vowel or not. Print capital YES or NO.
For example, if input_str = 'analytics' then, your output should print 'YES'.
#method1
input_str="alpha"
if input_str[0] in ['a','e','i','o','u']:
print('YES')
else:
print('NO'
#Method2
input_str="alpha"
i=input_str[0]
if(i in "aeiou"):
print('YES')
else:
print('NO')
What will the following segment of code print? Try solving it verbally.
if True or True:
if False and True or False:
print('A')
elif False and False or True and True:
print('B')
else:
print('C')
else:
print('D')
O/P : B
What will the following segment of code print? Try doing this verbally.
if (10 < 0) and (0 < -10):
print("A")
elif (10 > 0) or False:
print("B")
else:
print("C")
Creating a List Comprehension
Description You are given an integer 'n' as the input. Create a list comprehension containing the squares of the integers from 1 till n^2 (including 1 and n), and print the list.
For example, if the input is 4, the output should be a list as follows:
[1, 4, 9, 16]
#Method1
n = int(input('Enter number'))
square=[i**2 for i in range(1,n+1)]
print(square)
Enter number4
[1, 4, 9, 16]
#Method2
n = int(input('Enter number'))
# Write your code here (remember to print the list)
final_list=[i**2 for i in range(1,n+1)] #remember to use range(1,n+1)
#using range(n) will give 0,1,2,... n-1
#we want 1, 2, 3, 4, ... n
print(final_list)
Function
Description Create a function squared(), which takes x and y as arguments and returns the x**y value. For e.g., if x = 2 and y = 3 , then the output is 8.
input_list = ['6','7']
x = int(input_list[0])
y = int(input_list[1])
def squared(x,y):
return(x**y)
print(squared(x,y))
Lambda
Description Create a lambda function 'greater', which takes two arguments x and y and return x if x>y otherwise y. If x = 2 and y= 3, then the output should be 3.
#Method1
input_list = [4,5]
a = int(input_list[0])
b = int(input_list[1])
greater=lambda x,y: x if x>y else y
print(greater(a,b))
#Method2
input_list = [4,5]
a = int(input_list[0])
b = int(input_list[1])
#Write your code here
def greater(a,b):
if(a>b):
return a
return b
print(greater(a,b))
Print word number of Times
def say(message, times = 1):
print(message * times)
say('Hello')
say('World', 5)
Map Function
Description Using the Map function, create a list 'cube', which consists of the cube of numbers in input_list.
For e.g. if the input list is [5,6,4,8,9], the output should be [125, 216, 64, 512, 729]
input_list = [5,6,4,8,9]
cube=list(map(lambda x: x**3, input_list))
print(cube)
Map Function
Description Using the function Map, count the number of words that start with ‘S’ in input_list.
input_list = ['San Jose', 'San Francisco', 'Santa Fe', 'Houston']
count = sum(map(lambda x: x[0] == 'S', input_list))
print(count)
Map Function
Description Create a list ‘name’ consisting of the combination of the first name and the second name from list 1 and 2 respectively.
For e.g. if the input list is: [ ['Ankur', 'Avik', 'Kiran', 'Nitin'], ['Narang', 'Sarkar', 'R', 'Sareen']]
the output list should be the list: ['Ankur Narang', 'Avik Sarkar', 'Kiran R', 'Nitin Sareen']
input_list = [['Ankur','Avik','Kiran','Nitin'],['Narang','Sarkar','R','Sareen']]
first_name = input_list[0]
last_name = input_list[1]
combine=lambda x,y:x+' '+y
name = list(map(combine,first_name,last_name))
print(name)
O/P : ['Ankur Narang', 'Avik Sarkar', 'Kiran R', 'Nitin Sareen']
Filter Function
Description You are given a list of strings such as input_list = ['hdjk', 'salsap', 'sherpa'].
Extract a list of names that start with an ‘s’ and end with a ‘p’ (both 's' and 'p' are lowercase) in input_list.
Note: Use the filter() function
input_list = ['hdjk', 'salsap', 'sherpa']
sp =list(filter(lambda x:x[0].lower()=='s' and x[-1]=='p',input_list))
print(sp)
O/P: ['salsap']
Reduce Function
Description Using the Reduce function, concatenate a list of words in input_list, and print the output as a string. If input_list = ['I','Love','Python'], the output should be the string 'I Love Python'.
input_list=['All','you','have','to','fear','is','fear','itself']
from functools import reduce
result=reduce(lambda x,y: x+" "+y, input_list)
print(result)
O/P: All you have to fear is fear itself
Reduce Function
Description You are given a list of numbers such as input_list = [31, 63, 76, 89]. Find and print the largest number in input_list using the reduce() function
input_list = [65,76,87,23,12,90,99]
from functools import reduce
answer = reduce(lambda x,y: x if x>y else y,input_list)
print(answer)
How will you extract ‘love’ from the string S = “I love Python”?
S = "I love Python"
print(S[2:6])
print(S[2:-7])
print(S[-11:-7])
love
love
love
Dictionary Iteration
What will the output be of the following code?
D = {1:['Raj', 22], 2:['Simran', 21], 3:['Rahul', 40]}
for val in D:
print(val)
O/p
1
2
3
Python Comprehensions
What will the ‘comprehension equivalent’ be for the following snippet of code?
for sentence in paragraph:
for word in sentence.split():
single_word_list.append(word)
Answer
[word for sentence in paragraph for word in sentence.split()]
Feedback :
[word for sentence in paragraph for word in sentence.split()] is the right comprehension equivalent of the code provided. You need to put it in square brackets [] since the output will be a list.
Function Arguments
What will the output of the following code be?
def my_func(*args):
return(sum(args))
print(my_func(1,2,3,4,5))
print(my_func(6,7,8))
def my_func(*args):
return(sum(args))
print(my_func(1,2,3,4,5))
print(my_func(6,7,8))
15
21
squares of all the numbers in a list L = [1, 2, 3, 4]?
L = [1, 2, 3, 4]
print(list(map(lambda x : x ** 2, L)))
[1, 4, 9, 16]
Factorial
Description Given a number ‘n’, output its factorial using reduce(). Note: Make sure you handle the edge case of zero. As you know, 0! = 1
P.S.: Finding the factorial without using the reduce() function might lead to deduction of marks.
Examples:
Input 1: 1 Output 1: 1
Input 2: 3 Output 2: 6
def factorial(n):
if (n == 0):
return 1
else:
return reduce( lambda x,y:x*y , range(1,n+1))
print(factorial(n))
#with Reduce Function
n = int(input())
fact = reduce(lambda x, y: x*y, range(1, n + 1)) if n>0 else 0 if n == 0 else 'factorial not possible'
print(fact)
#method2
# Read the input as an integer
n = int(input())
# Import the reduce() function
from functools import reduce
# If n is zero, simply print 1 as this case can't be handles by reduce()
if n==0:
print(1)
# In all other cases, use reduce() between the range 1 and (n+1). For this range,
# define a lambda function with x and y and keep multiplying them using reduce().
# This way, when the code reaches the end of the range, i.e. n, the factorial
# computation will be complete.
else:
print(reduce(lambda x, y: x * y, range(1, n+1)))
Missing Values removal
Description Count the number of missing values in each column of the dataset 'marks'.
import pandas as pd
marks = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')
print(marks.isnull().sum())
Prefix 0
Assignment 2
Tutorial 12
Midterm 16
TakeHome 9
Final 5
dtype: int64
Removing rows with missing values
Description Remove all the rows in the dataset 'marks' having 5 missing values and then print the number of missing values in each column.
import pandas as pd
marks = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')
marks=marks.dropna(thresh=2)
print(marks.isnull().sum())
#Method1
import pandas as pd
df = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')
df = df[df.isnull().sum(axis=1) != 5]
print(df.isnull().sum())
Removing extra characters from a column
Description The given data frame 'customer' has a column 'Cust_id' which has values Cust_1, Cust_2 and so on. Remove the repeated 'Cust_' from the column Cust_id so that the output column Cust_id have just numbers like 1, 2, 3 and so on. Print the first 10 rows of the dataset 'customer' after processing.
#METHOD1
import pandas as pd
customer = pd.read_csv('https://query.data.world/s/y9rxL9mGdP6AXPiDaIL4yYm6DsfTV2')
customer['Cust_id'] =customer['Cust_id'].str.replace("Cust_",'')
print(customer.head(10))
#METHOD2
import pandas as pd
customer = pd.read_csv('https://query.data.world/s/y9rxL9mGdP6AXPiDaIL4yYm6DsfTV2')
customer['Cust_id'] = customer['Cust_id'].map(lambda x: x.strip('Cust_'))
print(customer.head(10))
Customer_Name Province Region Customer_Segment Cust_id
0 MUHAMMED MACINTYRE NUNAVUT NUNAVUT SMALL BUSINESS 1
1 BARRY FRENCH NUNAVUT NUNAVUT CONSUMER 2
2 CLAY ROZENDAL NUNAVUT NUNAVUT CORPORATE 3
3 CARLOS SOLTERO NUNAVUT NUNAVUT CONSUMER 4
4 CARL JACKSON NUNAVUT NUNAVUT CORPORATE 5
Rounding decimal places of a column
Description The given dataframe 'sleepstudy' has a column 'Reaction' with floating integer values up to 4 decimal places. Round off the decimal places to 1.
# Method1
from pydataset import data
sleepstudy =data('sleepstudy')
sleepstudy['Reaction'] = sleepstudy['Reaction'].round(1)
print(sleepstudy.head(10))
#Method2
from pydataset import data
sleepstudy =data('sleepstudy')
sleepstudy['Reaction'] = sleepstudy['Reaction'].round(decimals=1)
print(sleepstudy.head(10))
Reaction Days Subject
1 249.6 0 308
2 258.7 1 308
3 250.8 2 308
Duplicated Rows
Description The given Dataframe 'rating' has repeated rows. You need to remove the duplicated rows.
import pandas as pd
rating = pd.read_csv('https://query.data.world/s/EX0EpmqwfA2UYGz1Xtd_zi4R0dQpog')
rating_update = rating.drop_duplicates()
print(rating.shape)
print(rating_update.shape)
(1254, 5)
(1149, 5)
Derived Variable
Description The given dataset 'cust_rating' has 3 columns i.e 'rating', ' food_rating', 'service_rating'. Create a new variable 'avg_rating'.
import pandas as pd
cust_rating = pd.read_csv('https://query.data.world/s/ILc-P4llUraMaYN6N6Bdw7p6kUvHnj')
cust_rating['avg_rating'] = round( (cust_rating['rating']+ cust_rating['food_rating']+ cust_rating['service_rating'])/3)
print(cust_rating.head(10))
userID placeID rating food_rating service_rating avg_rating
0 U1077 135085 2 2 2 2.0
1 U1077 135038 2 2 1 2.0
2 U1077 132825 2 2 2 2.0
Extracting Day From a Date
Description The given dataset 'order' has a variable 'Order_Date' with the dates of purchase. Create a new variable 'day' which will contain the day from the date at variable Order_Date.
import pandas as pd
order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')
order['Order_Date'] = pd.to_datetime(order['Order_Date'])
order['day'] = order['Order_Date'].dt.day
print(order.head(10))
Order_ID Order_Date Order_Priority Ord_id day
0 3 2010-10-13 LOW Ord_1 13
1 293 2012-01-10 HIGH Ord_2 10
2 483 2011-10-07 HIGH Ord_3 7
3 515 2010-08-28 NOT SPECIFIED Ord_4 28
4 613 2011-06-17 HIGH Ord_5 17
#Alternate MEthod
import pandas as pd
order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')
order['Order_Date'] = pd.to_datetime(order['Order_Date'])
order['day'] = order['Order_Date'].apply(lambda x: x.day)
print(order.head(10))
You're given a list of non-negative integers. Your task is to round the given numbers to the nearest multiple of 10. For instance, 15 should be rounded to 20 whereas 14 should be rounded to 10. After rounding the numbers, find their sum.
Hint: The Python pre-defined function round() rounds off to nearest even number - it round 0.25 to 0.2. You might want to write your own function to round as per your requirement.
Sample input (a list): [2, 18, 10]
Sample output (an integer): 30
import ast,sys
import math
input_str = sys.stdin.read()
input_list = [2, 18, 10]
# write code here
# rounds to nearest, ties away from zero
def custom_round(n, ndigits=1):
"""
Takes in any decimal number and outputs rounded number
examples:
0.25 is rounded to 0.3
0.35 is rounded to 0.4
0.21 is rounded to 0.2
"""
part = n * 10 ** ndigits
delta = part - int(part)
# round to nearest, ties away from zero
if delta >= 0.5:
part = math.ceil(part)
else:
part = math.floor(part)
return part / (10 ** ndigits)
def round_to_nearest_10(n):
""" takes in 15 and outputs 20"""
return int(100*custom_round(n/100, 1))
rounded_list = list(map(round_to_nearest_10, input_list))
result = sum(rounded_list)
# do not change the following code
print(result)
Sum and Squares You're given a natural number 'n'. First, calculate the sum of squares of all the natural numbers up to 'n'. Then calculate the square of the sum of all natural numbers up to 'n'. Return the absolute difference of these two quantities.
For instance, if n=3, then natural numbers up to 3 are: 1, 2 and 3. The sum of squares up to 3 will be 1^2 + 2^2 + 3^2 = 14. The square of the sum of natural numbers up to 3 is (1+2+3)^2=36. The result, which is their absolute difference is 22.
Sample input (an integer): 3
Sample output (an integer): 22
n = 3
def sum2(n):
s=0
for i in range(n+1):
s=s+i**2
return s
def sum1(n):
s=0
for i in range(n+1):
s=s+i
return s**2
# store the result in the following variable
abs_difference = sum1(n)-sum2(n)
# print result --- do not change the following code
print(abs_difference)
#Method 1
import ast,sys
input_str = sys.stdin.read()
n = 3
# write your code here
numbers = [number+1 for number in range(n)]
sum_of_squares = sum(list(map(lambda x: x**2, numbers)))
square_of_sum = sum(numbers)**2
# store the result in the following variable
abs_difference = abs(sum_of_squares - square_of_sum)
# print result --- do not change the following code
print(abs_difference)
def reverse(s):
str = ""
for i in s:
str = i + str
return str
s = ['ram', 'krishn','mishra']
rev=list(map(reverse,s))
print(rev)
Weird Function
In data science, quite often you need to implement research papers and write code according to what's present in those papers. Research papers have a lot of maths involved and you need to implement the maths in code. In this exercise, you're required to implement some maths in code. The problem is as follows:
For fixed integers a, b, c, define a weird function F(n) as follows: F(n) = n - c for all n > b F(n) = F(a + F(a + F(a + F(a + n)))) for all n ≤ b.
Also, define S(a, b, c) = ∑F(n) where n takes the values 0 till b [in other words, S(a, b, c) = F(0) + F(1) + F(2) + .... F(b-1) + F(b)].
The input will be the value of a, b and c. The output should be S(a, b, c). You can define the functions in your own customized way with no restrictions on the number of parameters. For example, you can define the function S which can take additional parameters than a, b and c. Just make sure the code behaves as per the maths.
For example, if a = 20, b = 100 and c = 15, then F(0) = 195 and F(2000) = 1985. Therefore, S(20, 100, 15) = 14245
import numpy as np
input_list = [20,100,15]
a = input_list[0]
b = input_list[1]
c = input_list[2]
# write code here
sum = 0
def weird_function(a,b,c,n):
if n>b:
return n-c
else:
return weird_function(a,b,c,a+weird_function(a,b,c,a+weird_function(a,b,c,a+weird_function(a,b,c,a+n))))
def large_sum(a, b, c):
large_sum = 0
for value in range(b+1):
large_sum += weird_function(a, b, c, value)
return large_sum
# store the result in the following variable
result = large_sum(a, b, c)
# print result -- do not change the following code
print(result)
Python Program to check Armstrong Number
def digit(n):
count=0
while n!=0:
rem=n%10
n=n//10
count=count+1
return count
def armstrong(n,d):
sum=0
while n!=0:
rem=n%10
sum=sum+(pow(rem,d))
n=n//10
return sum
n=int(input('Enter number'))
print(armstrong(n,digit(n)))
if(n==armstrong(n,digit(n))):
print('Number is Armstrong')
else:
print('Number is Not Armstrong')
Swap two rows
Description Given m and n, swap the mth and nth rows of the 2-D NumPy array given below.
a = [[4 3 1] [5 7 0] [9 9 3] [8 2 4]]
import numpy as np
# Given array
a = np.array([[4, 3, 1], [5, 7, 0], [9, 9, 3], [8, 2, 4]])
# Read the values of m and n
m = 0
n = 2
a[[m,n]]=a[[n,m]]
# Print the array after swapping
print(a)
Create border array Description Given a single integer n, create an (n x n) 2D array with 1 on the border and 0 on the inside.
Note: Make sure the array is of type int.
Example: Input 1: 4 Output 1: [[1 1 1 1] [1 0 0 1] [1 0 0 1] [1 1 1 1]] Input 2: 2 Output 2: [[1 1] [1 1]]
# Read the variable from STDIN
n = int(input())
import numpy as np
a=np.ones((n,n), dtype=int)
a[1:-1,1:-1] = 0
print(a)
Set Index in Dataframe
Description Using set_index command set the column 'X' as the index of the dataset and then print the head of the dataset. Hint: Use inplace = False
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df.set_index('X',inplace=False)
print(df_2.head())
Y month day FFMC DMC DC ISI temp RH wind rain area
X
7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0
7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0
Sorting Dataframes
Description Sort the dataframe on 'month' and 'day' in ascending order in the dataframe 'df'.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df.sort_values(by=['month','day'],ascending=True)
print(df_2.head(20))
X Y month day FFMC DMC DC ISI temp RH wind rain area
241 4 4 apr fri 83.0 23.3 85.3 2.3 16.7 20 3.1 0.0 0.00
442 6 5 apr mon 87.9 24.9 41.6 3.7 10.9 64 3.1 0.0 3.35
19 6 4 apr sat 86.3 27.4 97.1 5.1 9.3 44 4.5 0.0 0.00
239 7 5 apr sun 81.9 3.0 7.9 3.5 13.4 75 1.8 0.0 0.00
DataFrames
Description Given a dataframe 'df' use the following commands and analyse the result. describe() columns shape
import numpy as np
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
print(df.describe())
print(df.columns)
print(df.shape)
X Y FFMC ... wind rain area
count 517.000000 517.000000 517.000000 ... 517.000000 517.000000 517.000000
mean 4.669246 4.299807 90.644681 ... 4.017602 0.021663 12.847292
std 2.313778 1.229900 5.520111 ... 1.791653 0.295959 63.655818
min 1.000000 2.000000 18.700000 ... 0.400000 0.000000 0.000000
Indexing Dataframes
Description Print only the even numbers of rows of the dataframe 'df'.
Note: Don't include the row indexed zero.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df[2::2]
print(df_2.head(20))
X Y month day FFMC DMC DC ISI temp RH wind rain area
2 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0
4 8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0
6 8 6 aug mon 92.3 88.9 495.6 8.5 24.1 27 3.1 0.0 0.0
8 8 6 sep tue 91.0 129.5 692.6 7.0 13.1 63 5.4 0.0 0.0
10 7 5 sep sat 92.5 88.0 698.6 7.1 17.8 51 7.2 0.0 0.0
12 6 5 aug fri 63.5 70.8 665.3 0.8 17.0 72 6.7 0.0 0.0
Remove the leading spaces from the string input_str = ' This is my first code'
[ ]
1
2
3
4
5
#Remove the leading spaces from the string input_str = ' This is my first code'
# Reading the input as a string; ignore the following two lines
input_str = ' This is my first Code'
final_str = input_str.strip()
print(final_str)
This is my first Code
String Split
Description Split the string input_str = 'Kumar_Ravi_003' to the person's second name, first name and unique customer code. In this example, second_name= 'Kumar', first_name= 'Ravi', customer_code = '003'.
A sample output of the input 'Kumar_Ravi_003' is: Ravi Kumar 003
Note that you need to print in the order first name, last name and customer code.
# First Method
input_str ='Kumar_Ravi_003'
first_name = input_str[6:10]
second_name = input_str[0:5]#write your answer here
customer_code = input_str[11:14]#write your answer here
print(first_name)
print(second_name)
print(customer_code)
Ravi
Kumar
003
# Second Method
input_str ='Kumar_Ravi_003'
name = input_str.split('_')
first_name=name[0]
second_name = name[1]
customer_code = name[2]
print(second_name +" "+ first_name+" "+customer_code)
Ravi Kumar 003
input_str = 'Kumar_Ravi_003'
#first we split the string using '_'
#first name will be the 2nd element (index 1) second name will be first element(index 0)
n_list=input_str.split("_")
first_name = n_list[1]#write your answer here
second_name = n_list[0]#write your answer here
customer_code = n_list[2]#write your answer here
print(first_name)
print(second_name)
print(customer_code)
Ravi
Kumar
003
17 / 3 # classic division returns a float 17 // 3 # floor division discards the fractional part
List_remove_append
Description Remove SPSS from input_list=['SAS', 'R', 'PYTHON', 'SPSS'] and add 'SPARK' in its place.
[ ]
input_list = ['SAS', 'R', 'PYTHON', 'SPSS']
# Write code to remove 'SPSS'
input_list.remove('SPSS')
# Write code to append 'SPARK'
input_list.append("SPARK")
print(input_list)
['SAS', 'R', 'PYTHON', 'SPARK']
string to list conversion
Description Convert a string input_str = 'I love Data Science & Python' to a list by splitting it on ‘&’. The sample output for this string will be: ['I love Data Science ', ' Python']
input_str = 'I love Data Science & Python'
#we will simply split the string at &
output_list = input_str.split('&')#Type your answer here
print(output_list)
['I love Data Science ', ' Python']
List to String
Description Convert a list ['Pythons syntax is easy to learn', 'Pythons syntax is very clear'] to a string using ‘&’. The sample output of this string will be: Pythons syntax is easy to learn & Pythons syntax is very clear
Note that there is a space on both sides of '&' (as usual in English sentences).
[ ]
input_str = ['Pythons syntax is easy to learn', 'Pythons syntax is very clear']
string_1 =" & ".join(input_str) #Type your answer here
print(string_1)
Pythons syntax is easy to learn & Pythons syntax is very clear
Nested List
Description Extract Python from a nested list input_list = [['SAS','R'],['Tableau','SQL'],['Python','Java']]
input_list = [['SAS','R'],['Tableau','SQL'],['Python','Java']]
print(input_list[2][0])
Python
Tuple
Description Add the element ‘Python’ to a tuple input_tuple = ('Monty Python', 'British', 1969). Since tuples are immutable, one way to do this is to convert the tuple to a list, add the element, and convert it back to a tuple.
To learn how to convert a list to a tuple, search for it on Google / Stack Overflow etc.
input_tuple = ('Monty Python', 'British', 1969)
list1=list(input_tuple)
list1.append('Python')
print(tuple(list1))
('Monty Python', 'British', 1969, 'Python')
[ ]
Dict_Error
Description From a Dictionary input_dict={'Name': 'Monty', 'Profession': 'Singer' }, get the value of a key ‘Label’ which is not a part of the dictionary, in such a way that Python doesn't hit an error. If the key does not exist in the dictionary, Python should return 'NA'.
[ ]
# Method 1
input_dict={'Name': 'Monty', 'Profession': 'Singer' }
input_dict.get('Label','NA')
'NA'
# Method 2
input_dict={'Name': 'Monty', 'Profession': 'Singer' }
if('Label' in input_dict.keys()):
answer = input_dict['Label']
else:
answer='NA'
print(answer)
NA
Getting a Value from a Dictionary.
Description Extract the company headed by Tim Cook from the dictionary {'Jack Dorsey': 'Twitter', 'Tim Cook': 'Apple','Jeff Bezos': 'Amazon','Mukesh Ambani': 'RJIO'}
input_dict={'Jack Dorsey': 'Twitter', 'Tim Cook': 'Apple','Jeff Bezos': 'Amazon','Mukesh Ambani': 'RJIO'}
name = input_dict['Tim Cook']
print(name)
Apple
List of Values in a Dictionary.
Description Create a SORTED list of all values from the dictionary input_dict = {'Jack Dorsey' : 'Twitter' , 'Tim Cook' : 'Apple','Jeff Bezos' : 'Amazon' ,'Mukesh Ambani' : 'RJIO'}
[ ]
1
2
3
input_dict = {'Jack Dorsey' : 'Twitter' , 'Tim Cook' : 'Apple','Jeff Bezos' : 'Amazon' ,'Mukesh Ambani' : 'RJIO'}
value_list = input_dict.values()
print(sorted(value_list))
['Amazon', 'Apple', 'RJIO', 'Twitter']
What will the output of the following set of instructions be?
d = {'Python':40, 'R':45} print(list(d.keys()))
[ ]
1
2
d = {'Python':40, 'R':45}
print(list(d.keys()))
['Python', 'R']
Set_diff
Description Find the difference, using difference and symmetric_difference, between two given lists - list1 and list2.
First, convert the lists into sets and store them as set_1 and set_2. Then store the difference and symmetric difference in answer_1 and answer_2 respectively. Print both the answers as sorted lists, i.e. convert the final sets to lists, sort it and then return it.
[ ]
1
2
3
4
5
6
7
8
9
10
list_1 = [1,2,3,4,5,6]
list_2 = [2,3,4,5,6,7,8,9]
set_1 = set(list_1)
set_2 = set(list_2)
answer_1 = sorted(list(set_1.difference(set_2)))
answer_2 = sorted(list(set_1.symmetric_difference(set_2)))
print(answer_1)
print(answer_2)
[1]
[1, 7, 8, 9]
If-Else
Description Write a code to check if the string in input_str starts with a vowel or not. Print capital YES or NO.
For example, if input_str = 'analytics' then, your output should print 'YES'.
[ ]
1
2
3
4
5
6
#method1
input_str="alpha"
if input_str[0] in ['a','e','i','o','u']:
print('YES')
else:
print('NO')
YES
[ ]
1
2
3
4
5
6
7
#Method2
input_str="alpha"
i=input_str[0]
if(i in "aeiou"):
print('YES')
else:
print('NO')
YES
What will the following segment of code print? Try solving it verbally.
[ ]
1
2
3
4
5
6
7
8
9
if True or True:
if False and True or False:
print('A')
elif False and False or True and True:
print('B')
else:
print('C')
else:
print('D')
B
What will the following segment of code print? Try doing this verbally.
[ ]
1
2
3
4
5
6
if (10 < 0) and (0 < -10):
print("A")
elif (10 > 0) or False:
print("B")
else:
print("C")
B
Creating a List Comprehension
Description You are given an integer 'n' as the input. Create a list comprehension containing the squares of the integers from 1 till n^2 (including 1 and n), and print the list.
For example, if the input is 4, the output should be a list as follows:
[1, 4, 9, 16]
[ ]
1
2
3
4
5
6
#Method1
n = int(input('Enter number'))
square=[i**2 for i in range(1,n+1)]
print(square)
Enter number4
[1, 4, 9, 16]
[ ]
1
2
3
4
5
6
7
8
9
10
#Method2
n = int(input('Enter number'))
# Write your code here (remember to print the list)
final_list=[i**2 for i in range(1,n+1)] #remember to use range(1,n+1)
#using range(n) will give 0,1,2,... n-1
#we want 1, 2, 3, 4, ... n
print(final_list)
Enter number5
[1, 4, 9, 16, 25]
Function
Description Create a function squared(), which takes x and y as arguments and returns the x**y value. For e.g., if x = 2 and y = 3 , then the output is 8.
[ ]
1
2
3
4
5
6
7
8
input_list = ['6','7']
x = int(input_list[0])
y = int(input_list[1])
def squared(x,y):
return(x**y)
print(squared(x,y))
279936
Lambda
Description Create a lambda function 'greater', which takes two arguments x and y and return x if x>y otherwise y. If x = 2 and y= 3, then the output should be 3.
[ ]
1
2
3
4
5
6
#Method1
input_list = [4,5]
a = int(input_list[0])
b = int(input_list[1])
greater=lambda x,y: x if x>y else y
print(greater(a,b))
5
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
#Method2
input_list = [4,5]
a = int(input_list[0])
b = int(input_list[1])
#Write your code here
def greater(a,b):
if(a>b):
return a
return b
print(greater(a,b))
5
[ ]
1
2
3
4
5
def say(message, times = 1):
print(message * times)
say('Hello')
say('World', 5)
Hello
WorldWorldWorldWorldWorld
Map Function
Description Using the Map function, create a list 'cube', which consists of the cube of numbers in input_list.
For e.g. if the input list is [5,6,4,8,9], the output should be [125, 216, 64, 512, 729]
[ ]
1
2
3
input_list = [5,6,4,8,9]
cube=list(map(lambda x: x**3, input_list))
print(cube)
[125, 216, 64, 512, 729]
Map Function
Description Using the function Map, count the number of words that start with ‘S’ in input_list.
[ ]
1
2
3
input_list = ['San Jose', 'San Francisco', 'Santa Fe', 'Houston']
count = sum(map(lambda x: x[0] == 'S', input_list))
print(count)
3
Map Function
Description Create a list ‘name’ consisting of the combination of the first name and the second name from list 1 and 2 respectively.
For e.g. if the input list is: [ ['Ankur', 'Avik', 'Kiran', 'Nitin'], ['Narang', 'Sarkar', 'R', 'Sareen']]
the output list should be the list: ['Ankur Narang', 'Avik Sarkar', 'Kiran R', 'Nitin Sareen']
[ ]
1
2
3
4
5
6
input_list = [['Ankur','Avik','Kiran','Nitin'],['Narang','Sarkar','R','Sareen']]
first_name = input_list[0]
last_name = input_list[1]
combine=lambda x,y:x+' '+y
name = list(map(combine,first_name,last_name))
print(name)
['Ankur Narang', 'Avik Sarkar', 'Kiran R', 'Nitin Sareen']
Filter Function
Description You are given a list of strings such as input_list = ['hdjk', 'salsap', 'sherpa'].
Extract a list of names that start with an ‘s’ and end with a ‘p’ (both 's' and 'p' are lowercase) in input_list.
Note: Use the filter() function.
[ ]
1
2
3
input_list = ['hdjk', 'salsap', 'sherpa']
sp =list(filter(lambda x:x[0].lower()=='s' and x[-1]=='p',input_list))
print(sp)
['salsap']
Reduce Function
Description Using the Reduce function, concatenate a list of words in input_list, and print the output as a string. If input_list = ['I','Love','Python'], the output should be the string 'I Love Python'.
[ ]
1
2
3
4
input_list=['All','you','have','to','fear','is','fear','itself']
from functools import reduce
result=reduce(lambda x,y: x+" "+y, input_list)
print(result)
All you have to fear is fear itself
Reduce Function
Description You are given a list of numbers such as input_list = [31, 63, 76, 89]. Find and print the largest number in input_list using the reduce() function.
[ ]
1
2
3
4
input_list = [65,76,87,23,12,90,99]
from functools import reduce
answer = reduce(lambda x,y: x if x>y else y,input_list)
print(answer)
99
How will you extract ‘love’ from the string S = “I love Python”?
[ ]
1
2
3
4
S = "I love Python"
print(S[2:6])
print(S[2:-7])
print(S[-11:-7])
love
love
love
Dictionary Iteration
What will the output be of the following code?
[ ]
1
2
3
D = {1:['Raj', 22], 2:['Simran', 21], 3:['Rahul', 40]}
for val in D:
print(val)
1
2
3
Python Comprehensions
What will the ‘comprehension equivalent’ be for the following snippet of code?
[ ]
1
2
3
4
for sentence in paragraph:
for word in sentence.split():
single_word_list.append(word)
Answer [word for sentence in paragraph for word in sentence.split()]
Feedback : [word for sentence in paragraph for word in sentence.split()] is the right comprehension equivalent of the code provided. You need to put it in square brackets [] since the output will be a list.
Function Arguments
What will the output of the following code be?
def my_func(*args):
return(sum(args))
print(my_func(1,2,3,4,5))
print(my_func(6,7,8))
[ ]
1
2
3
4
5
def my_func(*args):
return(sum(args))
print(my_func(1,2,3,4,5))
print(my_func(6,7,8))
15
21
squares of all the numbers in a list L = [1, 2, 3, 4]?
[ ]
1
2
L = [1, 2, 3, 4]
print(list(map(lambda x : x ** 2, L)))
[1, 4, 9, 16]
Factorial
Description Given a number ‘n’, output its factorial using reduce(). Note: Make sure you handle the edge case of zero. As you know, 0! = 1
P.S.: Finding the factorial without using the reduce() function might lead to deduction of marks.
Examples:
Input 1: 1 Output 1: 1
Input 2: 3 Output 2: 6
[ ]
1
2
3
4
5
6
def factorial(n):
if (n == 0):
return 1
else:
return reduce( lambda x,y:x*y , range(1,n+1))
print(factorial(n))
120
[ ]
1
2
3
n = int(input())
fact = reduce(lambda x, y: x*y, range(1, n + 1)) if n>0 else 0 if n == 0 else 'factorial not possible'
print(fact)
5
120
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Read the input as an integer
n = int(input())
# Import the reduce() function
from functools import reduce
# If n is zero, simply print 1 as this case can't be handles by reduce()
if n==0:
print(1)
# In all other cases, use reduce() between the range 1 and (n+1). For this range,
# define a lambda function with x and y and keep multiplying them using reduce().
# This way, when the code reaches the end of the range, i.e. n, the factorial
# computation will be complete.
else:
print(reduce(lambda x, y: x * y, range(1, n+1)))
7
Set Operations
Description In a school, there are total 20 students numbered from 1 to 20. You’re given three lists named ‘C’, ‘F’, and ‘H’, representing students who play cricket, football, and hockey, respectively. Based on this information, find out and print the following: Students who play all the three sports Students who play both cricket and football but don’t play hockey Students who play exactly two of the sports Students who don’t play any of the three sports Format: Input: 3 lists containing numbers (ranging from 1 to 20) representing students who play cricket, football and hockey respectively. Output: 4 different lists containing the students according to the constraints provided in the questions.
Note: Make sure you sort the final lists (in an ascending order) that you get before printing them; otherwise your answer might not match the test-cases.
Examples: Input 1: [2, 5, 9, 12, 13, 15, 16, 17, 18, 19] [2, 4, 5, 6, 7, 9, 13, 16] [1, 2, 5, 9, 10, 11, 12, 13, 15] Output 1: [2, 5, 9, 13] [16] [12, 15, 16] [3, 8, 14, 20]
Explanation: 1.Given the three sets, you can see that the students numbered '2', '5', '9', and '13' play all the three sports.
The student numbered '16' plays cricket and football but doesn't play hockey.
The student numbered '12' and '15' plays cricket and hockey and the student numbered '16' plays cricket and football. There are no students who play only football and hockey. Hence, the students who play exactly two sports are 12, 15, and 16.
As you can see, the students who play none of the sports are 3, 8, 14, and 20.
[ ]
1
2
3
4
import pandas as pd
rain=pd.read_csv(input1)
rain_g=rain.pivot_table(values='rain', index='COUNTRY', aggfunc='mean')
print(rain_g)
[ ]
1
Missing Values removal
Description Count the number of missing values in each column of the dataset 'marks'.
[ ]
1
2
3
import pandas as pd
marks = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')
print(marks.isnull().sum())
Prefix 0
Assignment 2
Tutorial 12
Midterm 16
TakeHome 9
Final 5
dtype: int64
Removing rows with missing values
Description Remove all the rows in the dataset 'marks' having 5 missing values and then print the number of missing values in each column.
[ ]
1
2
3
4
import pandas as pd
marks = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')
marks=marks.dropna(thresh=2)
print(marks.isnull().sum())
Prefix 0
Assignment 0
Tutorial 10
Midterm 14
TakeHome 7
Final 3
dtype: int64
import pandas as pd
df = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')
df = df[df.isnull().sum(axis=1) != 5]
print(df.isnull().sum())
Prefix 0
Assignment 0
Tutorial 10
Midterm 14
TakeHome 7
Final 3
dtype: int64
Removing extra characters from a column
Description The given data frame 'customer' has a column 'Cust_id' which has values Cust_1, Cust_2 and so on. Remove the repeated 'Cust_' from the column Cust_id so that the output column Cust_id have just numbers like 1, 2, 3 and so on. Print the first 10 rows of the dataset 'customer' after processing.
[ ]
#METHOD1
import pandas as pd
customer = pd.read_csv('https://query.data.world/s/y9rxL9mGdP6AXPiDaIL4yYm6DsfTV2')
customer['Cust_id'] =customer['Cust_id'].str.replace("Cust_",'')
print(customer.head(10))
Customer_Name Province Region Customer_Segment Cust_id
0 MUHAMMED MACINTYRE NUNAVUT NUNAVUT SMALL BUSINESS 1
1 BARRY FRENCH NUNAVUT NUNAVUT CONSUMER 2
2 CLAY ROZENDAL NUNAVUT NUNAVUT CORPORATE 3
3 CARLOS SOLTERO NUNAVUT NUNAVUT CONSUMER 4
4 CARL JACKSON NUNAVUT NUNAVUT CORPORATE 5
5 MONICA FEDERLE NUNAVUT NUNAVUT CORPORATE 6
6 DOROTHY BADDERS NUNAVUT NUNAVUT HOME OFFICE 7
7 NEOLA SCHNEIDER NUNAVUT NUNAVUT HOME OFFICE 8
8 CARLOS DALY NUNAVUT NUNAVUT HOME OFFICE 9
9 CLAUDIA MINER NUNAVUT NUNAVUT SMALL BUSINESS 10
[ ]
#METHOD2
import pandas as pd
customer = pd.read_csv('https://query.data.world/s/y9rxL9mGdP6AXPiDaIL4yYm6DsfTV2')
customer['Cust_id'] = customer['Cust_id'].map(lambda x: x.strip('Cust_'))
print(customer.head(10))
Customer_Name Province Region Customer_Segment Cust_id
0 MUHAMMED MACINTYRE NUNAVUT NUNAVUT SMALL BUSINESS 1
1 BARRY FRENCH NUNAVUT NUNAVUT CONSUMER 2
2 CLAY ROZENDAL NUNAVUT NUNAVUT CORPORATE 3
3 CARLOS SOLTERO NUNAVUT NUNAVUT CONSUMER 4
4 CARL JACKSON NUNAVUT NUNAVUT CORPORATE 5
5 MONICA FEDERLE NUNAVUT NUNAVUT CORPORATE 6
6 DOROTHY BADDERS NUNAVUT NUNAVUT HOME OFFICE 7
7 NEOLA SCHNEIDER NUNAVUT NUNAVUT HOME OFFICE 8
8 CARLOS DALY NUNAVUT NUNAVUT HOME OFFICE 9
9 CLAUDIA MINER NUNAVUT NUNAVUT SMALL BUSINESS 10
Rounding decimal places of a column
Description The given dataframe 'sleepstudy' has a column 'Reaction' with floating integer values up to 4 decimal places. Round off the decimal places to 1.
[ ]
# Method1
from pydataset import data
sleepstudy =data('sleepstudy')
sleepstudy['Reaction'] = sleepstudy['Reaction'].round(1)
print(sleepstudy.head(10))
Reaction Days Subject
1 249.6 0 308
2 258.7 1 308
3 250.8 2 308
4 321.4 3 308
5 356.9 4 308
6 414.7 5 308
7 382.2 6 308
8 290.1 7 308
9 430.6 8 308
10 466.4 9 308
#Method2
from pydataset import data
sleepstudy =data('sleepstudy')
sleepstudy['Reaction'] = sleepstudy['Reaction'].round(decimals=1)
print(sleepstudy.head(10))
Reaction Days Subject
1 249.6 0 308
2 258.7 1 308
3 250.8 2 308
4 321.4 3 308
5 356.9 4 308
6 414.7 5 308
7 382.2 6 308
8 290.1 7 308
9 430.6 8 308
10 466.4 9 308
Duplicated Rows
Description The given Dataframe 'rating' has repeated rows. You need to remove the duplicated rows.
[ ]
1
2
3
4
5
6
import pandas as pd
rating = pd.read_csv('https://query.data.world/s/EX0EpmqwfA2UYGz1Xtd_zi4R0dQpog')
rating_update = rating.drop_duplicates()
print(rating.shape)
print(rating_update.shape)
(1254, 5)
(1149, 5)
Derived Variable
Description The given dataset 'cust_rating' has 3 columns i.e 'rating', ' food_rating', 'service_rating'. Create a new variable 'avg_rating'.
[ ]
1
2
3
4
5
6
7
import pandas as pd
cust_rating = pd.read_csv('https://query.data.world/s/ILc-P4llUraMaYN6N6Bdw7p6kUvHnj')
cust_rating['avg_rating'] = round( (cust_rating['rating']+ cust_rating['food_rating']+ cust_rating['service_rating'])/3)
print(cust_rating.head(10))
userID placeID rating food_rating service_rating avg_rating
0 U1077 135085 2 2 2 2.0
1 U1077 135038 2 2 1 2.0
2 U1077 132825 2 2 2 2.0
3 U1077 135060 1 2 2 2.0
4 U1068 135104 1 1 2 1.0
5 U1068 132740 0 0 0 0.0
6 U1068 132663 1 1 1 1.0
7 U1068 132732 0 0 0 0.0
8 U1068 132630 1 1 1 1.0
9 U1067 132584 2 2 2 2.0
Extracting Day From a Date
Description The given dataset 'order' has a variable 'Order_Date' with the dates of purchase. Create a new variable 'day' which will contain the day from the date at variable Order_Date.
[ ]
1
2
3
4
5
6
7
import pandas as pd
order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')
order['Order_Date'] = pd.to_datetime(order['Order_Date'])
order['day'] = order['Order_Date'].dt.day
print(order.head(10))
Order_ID Order_Date Order_Priority Ord_id day
0 3 2010-10-13 LOW Ord_1 13
1 293 2012-01-10 HIGH Ord_2 10
2 483 2011-10-07 HIGH Ord_3 7
3 515 2010-08-28 NOT SPECIFIED Ord_4 28
4 613 2011-06-17 HIGH Ord_5 17
5 643 2011-03-24 HIGH Ord_6 24
6 678 2010-02-26 LOW Ord_7 26
7 807 2010-11-23 MEDIUM Ord_8 23
8 868 2012-08-06 NOT SPECIFIED Ord_9 6
9 933 2012-04-08 NOT SPECIFIED Ord_10 8
[ ]
1
2
3
4
5
6
7
import pandas as pd
order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')
order['Order_Date'] = pd.to_datetime(order['Order_Date'])
order['day'] = order['Order_Date'].apply(lambda x: x.day)
print(order.head(10))
Order_ID Order_Date Order_Priority Ord_id day
0 3 2010-10-13 LOW Ord_1 13
1 293 2012-01-10 HIGH Ord_2 10
2 483 2011-10-07 HIGH Ord_3 7
3 515 2010-08-28 NOT SPECIFIED Ord_4 28
4 613 2011-06-17 HIGH Ord_5 17
5 643 2011-03-24 HIGH Ord_6 24
6 678 2010-02-26 LOW Ord_7 26
7 807 2010-11-23 MEDIUM Ord_8 23
8 868 2012-08-06 NOT SPECIFIED Ord_9 6
9 933 2012-04-08 NOT SPECIFIED Ord_10 8
Python Program for factorial of a number
[ ]
1
2
3
4
5
6
7
def fact(x):
r=1;
if (x==0):
return 1
else:
return(x*fact(x-1))
print(fact(5))
120
[ ]
1
2
3
4
5
6
def fact(n):
return 1 if (n==1 or n==0) else n * fact(n - 1);
n = 5;
print("Factorial of "+str(n)+" =",
fact(n))
Factorial of 5 = 120
find the sum of squeare seris
[ ]
↳ 3 cells hidden
Separate Letters from String
[ ]
↳ 1 cell hidden
Python program to convert time from 12 hour to 24 hour format
[ ]
↳ 2 cells hidden
Ordered and Unordered Categorical Variables
Categorical variables can be of two types - ordered categorical and unordered categorical. In unordered, it is not possible to say that a certain category is 'more or less' or 'higher or lower' than others. For example, color is such a variable (red is not greater or more than green etc.)
On the other hand, ordered categories have a notion of 'higher-lower', 'before-after', 'more-less' etc. For e.g. the age-group variable having three values - child, adult and old is ordered categorical because an old person is 'more aged' than an adult etc. In general, it is possible to define some kind of ordering.
The months in a year - Jan, Feb, March etc. Feedback : Months have an element of ordering - Jan comes before April, Dec comes after everything else etc. In general, all dates are ordered categorical variables (day 23 comes after day 11 of the month etc.)
Unordered Categorical Variables - Univariate Analysis
You have worked with some unordered categorical variables in the past, for example:
The Prodcut_Category in the retail sales dataset
The Customer_Segment in the retail sales dataset
The name of a batsman in any of the cricket datasets
Now imagine someone (say a client) gives you only an unordered categorical variable (and nothing else!), such as a column of size 4000 named 'country_of_person' with 130 unique countries and asks you 'can you extract anything useful from just this one variable?'.
Write down what how you would analyse just that variable to get something meaning out of it. Note that you have only one column to analyse.
The only thing you can do with an unordered categorical variable is to count the frequency of each category in the column. For example, you could observe that the product category 'furniture' appears 1000 times, 'technology' appears 810 times and so on.
Ordered Categorical Variables
You have already worked with ordered categorical variables before - there is a certain order or notion of 'high-low', 'before-after' etc. among the categories. For e.g. days of the week (Monday comes before Tuesday), grades of students (A is better than B), number of overs bowled by a bowler (3, 4, 9) etc.
Which of the following are other examples of ordered categorical variables? Choose all the correct options.
Dates in a year e.g. Jan 2, Mar 15 etc. Feedback : Dates are ordered - each day comes before or after other days. Correct
Star rating of a restaurant on Zomato on a scale of 1-5 Feedback : A rating of 5 is better than 4, 3, 2, 1.
Numeric and Ordered Categorical Variables
Anand mentioned that you can treat numeric variables as ordered categorical variables. For analysis, you can deliberately convert numeric variables into ordered categorical, for example, if you have incomes of a few thousand people ranging from
5,000to
100,000, you can categorise them into bins such as [5000, 10000], [10000,15000] and [15000, 20000].
This is called 'binning'.
Which of the following variables can be binned into ordered categorical variables? Mark all the correct options.
The temperature in a city over a certain time period Feedback : You can bin the temperatures as [0, 10 degrees], [10, 20 degrees] etc. Correct
The revenue generated per day of a company Feedback : This can also be binned e.g. [0, 10k], [10k, 20k] etc.
Extracting Day From a Date
Description The given dataset 'order' has a variable 'Order_Date' with the dates of purchase. Create a new variable 'day' which will contain the day from the date at variable Order_Date.
[ ]
1
2
3
4
5
6
7
8
#Method1
import pandas as pd
order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')
order['Order_Date'] = pd.to_datetime(order['Order_Date'])
order['day'] = order['Order_Date'].dt.day
print(order.head(10))
Order_ID Order_Date Order_Priority Ord_id day
0 3 2010-10-13 LOW Ord_1 13
1 293 2012-01-10 HIGH Ord_2 10
2 483 2011-10-07 HIGH Ord_3 7
3 515 2010-08-28 NOT SPECIFIED Ord_4 28
4 613 2011-06-17 HIGH Ord_5 17
5 643 2011-03-24 HIGH Ord_6 24
6 678 2010-02-26 LOW Ord_7 26
7 807 2010-11-23 MEDIUM Ord_8 23
8 868 2012-08-06 NOT SPECIFIED Ord_9 6
9 933 2012-04-08 NOT SPECIFIED Ord_10 8
[ ]
1
2
3
4
5
6
7
8
#Method2
import pandas as pd
order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')
order['Order_Date'] = pd.to_datetime(order['Order_Date'])
order['day'] = order['Order_Date'].apply(lambda x: x.day)
print(order.head(10))
Order_ID Order_Date Order_Priority Ord_id day
0 3 2010-10-13 LOW Ord_1 13
1 293 2012-01-10 HIGH Ord_2 10
2 483 2011-10-07 HIGH Ord_3 7
3 515 2010-08-28 NOT SPECIFIED Ord_4 28
4 613 2011-06-17 HIGH Ord_5 17
5 643 2011-03-24 HIGH Ord_6 24
6 678 2010-02-26 LOW Ord_7 26
7 807 2010-11-23 MEDIUM Ord_8 23
8 868 2012-08-06 NOT SPECIFIED Ord_9 6
9 933 2012-04-08 NOT SPECIFIED Ord_10 8
[ ]
1
2
3
4
5
import numpy as np
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
columns=['Col1', 'Col2', 'Col3', 'Col4'])
boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3','Col4'])
[ ]
1
2
3
4
5
6
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Put dataset on my github repo
df = pd.read_csv('https://raw.githubusercontent.com/mGalarnyk/Python_Tutorials/master/Kaggle/BreastCancerWisconsin/data/data.csv')
df.head(5)
[ ]
1
sns.boxplot(x='diagnosis', y='area_mean', data=df)
[ ]
1
Alarm Clock You're trying to automate your alarm clock by writing a function for it. You're given a day of the week encoded as 1=Mon, 2=Tue, ... 6=Sat, 7=Sun, and a boolean value (a boolean object is either True or False. Google "booleans python" to get a better understanding) indicating if you're are on vacation. Based on the day and whether you're on vacation, write a function that returns a time in form of a string indicating when the alarm clock should ring.
When not on a vacation, on weekdays, the alarm should ring at "7:00" and on the weekends (Saturday and Sunday) it should ring at "10:00".
While on a vacation, it should ring at "10:00" on weekdays. On vacation, it should not ring on weekends, that is, it should return "off".
Sample input (a list): [7,True]
Sample output (a string):
off
Sample input (a list): [3,True]
Sample output (a string):
10:00
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import ast,sys
input_list = [1,False]
day_of_the_week = input_list[0]
is_on_vacation = input_list[1]
# write your code here
def alarm_clock(day, vacation):
if(str(day) in '12345' and vacation==True):
return('10:00')
elif(str(day) in '12345' and vacation==False):
return('7:00')
elif(str(day) in '67' and vacation==False):
return('off')
else:
return 'off'
# do not change the following code
time = alarm_clock(day_of_the_week, is_on_vacation)
print(time.lower())
7:00
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import ast,sys
input_str = sys.stdin.read()
input_list = [1,False]
day_of_the_week = int(input_list[0])
is_on_vacation = input_list[1]
# write your code here
def alarm_clock(day, vacation):
weekends = [6, 7]
if vacation and day not in weekends:
return "10:00"
elif vacation and day in weekends:
return "off"
elif vacation == False and day not in weekends:
return "7:00"
elif vacation == False and day in weekends:
return "10:00"
# do not change the following code
time = alarm_clock(day_of_the_week, is_on_vacation)
print(time.lower())
7:00
You're given a list of non-negative integers. Your task is to round the given numbers to the nearest multiple of 10. For instance, 15 should be rounded to 20 whereas 14 should be rounded to 10. After rounding the numbers, find their sum.
Hint: The Python pre-defined function round() rounds off to nearest even number - it round 0.25 to 0.2. You might want to write your own function to round as per your requirement.
Sample input (a list): [2, 18, 10]
Sample output (an integer): 30
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import ast,sys
import math
input_str = sys.stdin.read()
input_list = [2, 18, 10]
# write code here
# rounds to nearest, ties away from zero
def custom_round(n, ndigits=1):
"""
Takes in any decimal number and outputs rounded number
examples:
0.25 is rounded to 0.3
0.35 is rounded to 0.4
0.21 is rounded to 0.2
"""
part = n * 10 ** ndigits
delta = part - int(part)
# round to nearest, ties away from zero
if delta >= 0.5:
part = math.ceil(part)
else:
part = math.floor(part)
return part / (10 ** ndigits)
def round_to_nearest_10(n):
""" takes in 15 and outputs 20"""
return int(100*custom_round(n/100, 1))
rounded_list = list(map(round_to_nearest_10, input_list))
result = sum(rounded_list)
# do not change the following code
print(result)
30
Sum and Squares You're given a natural number 'n'. First, calculate the sum of squares of all the natural numbers up to 'n'. Then calculate the square of the sum of all natural numbers up to 'n'. Return the absolute difference of these two quantities.
For instance, if n=3, then natural numbers up to 3 are: 1, 2 and 3. The sum of squares up to 3 will be 1^2 + 2^2 + 3^2 = 14. The square of the sum of natural numbers up to 3 is (1+2+3)^2=36. The result, which is their absolute difference is 22.
Sample input (an integer): 3
Sample output (an integer): 22
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
n = 3
def sum2(n):
s=0
for i in range(n+1):
s=s+i**2
return s
def sum1(n):
s=0
for i in range(n+1):
s=s+i
return s**2
# store the result in the following variable
abs_difference = sum1(n)-sum2(n)
# print result --- do not change the following code
print(abs_difference)
22
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import ast,sys
input_str = sys.stdin.read()
n = 3
# write your code here
numbers = [number+1 for number in range(n)]
sum_of_squares = sum(list(map(lambda x: x**2, numbers)))
square_of_sum = sum(numbers)**2
# store the result in the following variable
abs_difference = abs(sum_of_squares - square_of_sum)
# print result --- do not change the following code
print(abs_difference)
22
[ ]
1
2
3
4
5
6
7
8
9
10
def reverse(s):
str = ""
for i in s:
str = i + str
return str
s = ['ram', 'krishn','mishra']
rev=list(map(reverse,s))
print(rev)
['mar', 'nhsirk', 'arhsim']
Weird Function
In data science, quite often you need to implement research papers and write code according to what's present in those papers. Research papers have a lot of maths involved and you need to implement the maths in code. In this exercise, you're required to implement some maths in code. The problem is as follows:
For fixed integers a, b, c, define a weird function F(n) as follows: F(n) = n - c for all n > b F(n) = F(a + F(a + F(a + F(a + n)))) for all n ≤ b.
Also, define S(a, b, c) = ∑F(n) where n takes the values 0 till b [in other words, S(a, b, c) = F(0) + F(1) + F(2) + .... F(b-1) + F(b)].
The input will be the value of a, b and c. The output should be S(a, b, c). You can define the functions in your own customized way with no restrictions on the number of parameters. For example, you can define the function S which can take additional parameters than a, b and c. Just make sure the code behaves as per the maths.
For example, if a = 20, b = 100 and c = 15, then F(0) = 195 and F(2000) = 1985. Therefore, S(20, 100, 15) = 14245
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
input_list = [20,100,15]
a = input_list[0]
b = input_list[1]
c = input_list[2]
# write code here
sum = 0
def weird_function(a,b,c,n):
if n>b:
return n-c
else:
return weird_function(a,b,c,a+weird_function(a,b,c,a+weird_function(a,b,c,a+weird_function(a,b,c,a+n))))
def large_sum(a, b, c):
large_sum = 0
for value in range(b+1):
large_sum += weird_function(a, b, c, value)
return large_sum
# store the result in the following variable
result = large_sum(a, b, c)
# print result -- do not change the following code
print(result)
14245
Python Program to check Armstrong Number
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def digit(n):
count=0
while n!=0:
rem=n%10
n=n//10
count=count+1
return count
def armstrong(n,d):
sum=0
while n!=0:
rem=n%10
sum=sum+(pow(rem,d))
n=n//10
return sum
n=int(input('Enter number'))
print(armstrong(n,digit(n)))
if(n==armstrong(n,digit(n))):
print('Number is Armstrong')
else:
print('Number is Not Armstrong')
Enter number1234
354
Number is Not Armstrong
Swap two rows
Description Given m and n, swap the mth and nth rows of the 2-D NumPy array given below.
a = [[4 3 1] [5 7 0] [9 9 3] [8 2 4]]
[ ]
1
2
3
4
5
6
7
8
9
10
11
12
13
import numpy as np
# Given array
a = np.array([[4, 3, 1], [5, 7, 0], [9, 9, 3], [8, 2, 4]])
# Read the values of m and n
m = 0
n = 2
a[[m,n]]=a[[n,m]]
# Print the array after swapping
print(a)
[[9 9 3]
[5 7 0]
[4 3 1]
[8 2 4]]
Create border array Description Given a single integer n, create an (n x n) 2D array with 1 on the border and 0 on the inside.
Note: Make sure the array is of type int.
Example: Input 1: 4 Output 1: [[1 1 1 1] [1 0 0 1] [1 0 0 1] [1 1 1 1]] Input 2: 2 Output 2: [[1 1] [1 1]]
[ ]
1
2
3
4
5
6
# Read the variable from STDIN
n = int(input())
import numpy as np
a=np.ones((n,n), dtype=int)
a[1:-1,1:-1] = 0
print(a)
3
[[1 1 1]
[1 0 1]
[1 1 1]]
Set Index in Dataframe
Description Using set_index command set the column 'X' as the index of the dataset and then print the head of the dataset. Hint: Use inplace = False
[ ]
1
2
3
4
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df.set_index('X',inplace=False)
print(df_2.head())
Y month day FFMC DMC DC ISI temp RH wind rain area
X
7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0
7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0
7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0
8 6 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 0.0
8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0
Sorting Dataframes
Description Sort the dataframe on 'month' and 'day' in ascending order in the dataframe 'df'.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df.sort_values(by=['month','day'],ascending=True)
print(df_2.head(20))
X Y month day FFMC DMC DC ISI temp RH wind rain area
241 4 4 apr fri 83.0 23.3 85.3 2.3 16.7 20 3.1 0.0 0.00
442 6 5 apr mon 87.9 24.9 41.6 3.7 10.9 64 3.1 0.0 3.35
19 6 4 apr sat 86.3 27.4 97.1 5.1 9.3 44 4.5 0.0 0.00
239 7 5 apr sun 81.9 3.0 7.9 3.5 13.4 75 1.8 0.0 0.00
469 6 3 apr sun 91.0 14.6 25.6 12.3 13.7 33 9.4 0.0 61.13
470 5 4 apr sun 91.0 14.6 25.6 12.3 17.6 27 5.8 0.0 0.00
176 6 5 apr thu 81.5 9.1 55.2 2.7 5.8 54 5.8 0.0 4.61
196 6 5 apr thu 81.5 9.1 55.2 2.7 5.8 54 5.8 0.0 10.93
240 6 3 apr wed 88.0 17.2 43.5 3.8 15.2 51 2.7 0.0 0.00
12 6 5 aug fri 63.5 70.8 665.3 0.8 17.0 72 6.7 0.0 0.00
78 1 2 aug fri 90.1 108.0 529.8 12.5 14.7 66 2.7 0.0 0.00
142 8 6 aug fri 90.1 108.0 529.8 12.5 21.2 51 8.9 0.0 0.61
184 8 6 aug fri 93.9 135.7 586.7 15.1 20.8 34 4.9 0.0 6.96
DataFrames
Description Given a dataframe 'df' use the following commands and analyse the result. describe() columns shape
import numpy as np
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
print(df.describe())
print(df.columns)
print(df.shape)
X Y FFMC ... wind rain area
count 517.000000 517.000000 517.000000 ... 517.000000 517.000000 517.000000
mean 4.669246 4.299807 90.644681 ... 4.017602 0.021663 12.847292
std 2.313778 1.229900 5.520111 ... 1.791653 0.295959 63.655818
min 1.000000 2.000000 18.700000 ... 0.400000 0.000000 0.000000
25% 3.000000 4.000000 90.200000 ... 2.700000 0.000000 0.000000
50% 4.000000 4.000000 91.600000 ... 4.000000 0.000000 0.520000
75% 7.000000 5.000000 92.900000 ... 4.900000 0.000000 6.570000
max 9.000000 9.000000 96.200000 ... 9.400000 6.400000 1090.840000
[8 rows x 11 columns]
Index(['X', 'Y', 'month', 'day', 'FFMC', 'DMC', 'DC', 'ISI', 'temp', 'RH',
'wind', 'rain', 'area'],
dtype='object')
(517, 13)
Indexing Dataframes
Description Print only the even numbers of rows of the dataframe 'df'.
Note: Don't include the row indexed zero.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df[2::2]
print(df_2.head(20))
X Y month day FFMC DMC DC ISI temp RH wind rain area
2 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0
4 8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0
6 8 6 aug mon 92.3 88.9 495.6 8.5 24.1 27 3.1 0.0 0.0
Selecting Columns of a Dataframe
Description Print out the columns 'month', 'day', 'temp', 'area' from the dataframe 'df'.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df[['month','day','temp','area']]
print(df_2.head(20))
month day temp area
0 mar fri 8.2 0.0
1 oct tue 18.0 0.0
2 oct sat 14.6 0.0
3 mar fri 8.3 0.0
Dataframe iloc
Description Using iloc index the dataframe to print all the rows of the columns at index 3,4,5. Hint: Use 3,4,5 not 2,3,4
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df.iloc[:,[3,4,5]]
print(df_2.head(5))
day FFMC DMC
0 fri 86.2 26.2
1 tue 90.6 35.4
2 sat 90.6 43.7
Dataframes loc
Description Using loc function print out all the columns and rows from 2 to 20 of the 'df' dataset.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df.loc[2:5,:]
print(df_2)
X Y month day FFMC DMC DC ISI temp RH wind rain area
2 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0
3 8 6 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 0.0
4 8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0
Applying Conditions on Dataframes
Description Print all the columns and the rows where 'area' is greater than 0, 'wind' is greater than 1 and the 'temp' is greater than 15.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_2 = df.loc[(df.area>0)&(df.wind>1)&(df.temp>15),:]
print(df_2.head(5))
X Y month day FFMC DMC DC ISI temp RH wind rain area
138 9 9 jul tue 85.8 48.3 313.4 3.9 18.0 42 2.7 0.0 0.36
139 1 4 sep tue 91.0 129.5 692.6 7.0 21.7 38 2.2 0.0 0.43
140 2 5 sep mon 90.9 126.5 686.5 7.0 21.9 39 1.8 0.0 0.47
Dataframes Merge
Description Perform an inner merge on two data frames df_1 and df_2 on 'unique_id' and print the combined dataframe.
import pandas as pd
df_1 = pd.read_csv('https://query.data.world/s/vv3snq28bp0TJq2ggCdxGOghEQKPZo')
df_2 = pd.read_csv('https://query.data.world/s/9wVKjNT0yiRc3YbVJaiI8a6HGl2d74')
df_3 = pd.merge(df_1, df_2, how='inner', on='unique_id')
print(df_3.head(2))
import pandas as pd
df_1 = pd.read_csv('https://query.data.world/s/vv3snq28bp0TJq2ggCdxGOghEQKPZo')
df_2 = pd.read_csv('https://query.data.world/s/9wVKjNT0yiRc3YbVJaiI8a6HGl2d74')
df_3 = pd.merge(df_1,df_2,how='inner', on='unique_id')
print(df_3.head(2))
Dataframe Append
Description Append two datasets df_1 and df_2, and print the combined dataframe.
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
df_1 = pd.read_csv('https://query.data.world/s/vv3snq28bp0TJq2ggCdxGOghEQKPZo')
df_2 = pd.read_csv('https://query.data.world/s/9wVKjNT0yiRc3YbVJaiI8a6HGl2d74')
df_3 =df_1.append(df_2)
print(df_3.head())
Operations on multiple dataframes
Description Given three data frames containing the number of gold, silver, and bronze Olympic medals won by some countries, determine the total number of medals won by each country. Note: All the three data frames don’t have all the same countries. So, ensure you use the ‘fill_value’ argument (set it to zero), to avoid getting NaN values. Also, ensure you sort the final dataframe, according to the total medal count in descending order.
import numpy as np
import pandas as pd
# Defining the three dataframes indicating the gold, silver, and bronze medal counts
# of different countries
gold = pd.DataFrame({'Country': ['USA', 'France', 'Russia'],
'Medals': [15, 13, 9]}
)
silver = pd.DataFrame({'Country': ['USA', 'Germany', 'Russia'],
'Medals': [29, 20, 16]}
)
bronze = pd.DataFrame({'Country': ['France', 'USA', 'UK'],
'Medals': [40, 28, 27]}
)
# Set the index of the dataframes to 'Country' so that you can get the countrywise
# medal count
gold.set_index('Country', inplace = True)
silver.set_index('Country', inplace = True)
bronze.set_index('Country', inplace = True)
# Add the three dataframes and set the fill_value argument to zero to avoid getting
# NaN values
total = gold.add(silver, fill_value = 0).add(bronze, fill_value = 0)
# Sort the resultant dataframe in a descending order
total = total.sort_values(by = 'Medals', ascending = False)
# Print the sorted dataframe
print(total)
Dataframe grouping
Description Group the data 'df' by 'month' and 'day' and find the mean value for column 'rain' and 'wind'.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_md=df.groupby(['month','day'])
df_1 = df_md['rain','wind'].mean()
print(df_1.head())
rain wind
month day
apr fri 0.0 3.100000
mon 0.0 3.100000
sat 0.0 4.500000
sun 0.0 5.666667
thu 0.0 5.800000
Creating New Column in a Dataframe
Description Create a new column 'XY' which consist of values obtained from multiplying column 'X' and column 'Y'.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df['XY'] = df['X']*df['Y']
print(df.head())
X Y month day FFMC DMC DC ISI temp RH wind rain area XY
0 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0 35
1 7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0 28
2 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0 28
Dataframe Pivot Table
Description Group the data 'df' by 'month' and 'day' and find the mean value for column 'rain' and 'wind' using the pivot table command.
import numpy as np
import pandas as pd
df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')
df_1=pd.pivot_table(df, values=['rain','wind'], index=['month', 'day'], aggfunc='mean')
print(df_1.head(10))
rain wind
month day
apr fri 0.000000 3.100000
mon 0.000000 3.100000
sat 0.000000 4.500000
sun 0.000000 5.666667
Missing Values
Description Print out the number of missing values in each column in the given dataframe.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
print(df.isnull().sum())
Missing Values Percentage
Description Find out the percentage of missing values in each column in the given dataset.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
print(round(100*df.isnull().sum()/len(df.index),2))
Ord_id 0.00
Profit 0.65
Shipping_Cost 0.65
Product_Base_Margin 1.30
dtype: float64
Removing Missing Values From the Rows
Description Remove the missing values from the rows having greater than 5 missing values and then print the percentage of missing values in each column.
import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<=5]
print(round(100*df.isnull().sum()/len(df.index),2))
import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<=5]
print(round(100*df.isnull().sum()/len(df.index),2))
Mean Imputation
Description Impute the mean value at all the missing values of the column 'Product_Base_Margin' and then print the percentage of missing values in each column.
import numpy as np
import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df['Product_Base_Margin']=df['Product_Base_Margin'].mean()
print(round(100*df.isnull().sum()/len(df.index),2))
import numpy as np
import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df.loc[np.isnan(df['Product_Base_Margin']), ['Product_Base_Margin']] = df['Product_Base_Margin'].mean()
print(round(100*(df.isnull().sum()/len(df.index)), 2))