Python3 Practice Program

Remove the leading spaces from the string input_str = ' This is my first code '

input_str = ' This is my first code'

final_str = input_str.strip()

print(final_str)

String Split

Description Split the string input_str = 'Kumar_Ravi_003' to the person's second name, first name and unique customer code. In this example, second_name= 'Kumar', first_name= 'Ravi', customer_code = '003'.

A sample output of the input 'Kumar_Ravi_003' is: Ravi Kumar 003

Note that you need to print in the order first name, last name and customer code.

# First Method

input_str ='Kumar_Ravi_003'

first_name = input_str[6:10]

second_name = input_str[0:5]#write your answer here

customer_code = input_str[11:14]#write your answer here

print(first_name)

print(second_name)

print(customer_code)

# Second Method

input_str ='Kumar_Ravi_003'

name = input_str.split('_')

first_name=name[0]

second_name = name[1]

customer_code = name[2]

print(second_name +" "+ first_name+" "+customer_code)

input_str = 'Kumar_Ravi_003'

#first we split the string using '_'

#first name will be the 2nd element (index 1) second name will be first element(index 0)

n_list=input_str.split("_")

first_name = n_list[1]#write your answer here

second_name = n_list[0]#write your answer here

customer_code = n_list[2]#write your answer here

print(first_name)

print(second_name)

print(customer_code)

List_remove_append

Description Remove SPSS from input_list=['SAS', 'R', 'PYTHON', 'SPSS'] and add 'SPARK' in its place.

input_list = ['SAS', 'R', 'PYTHON', 'SPSS']

# Write code to remove 'SPSS'

input_list.remove('SPSS')

# Write code to append 'SPARK'

input_list.append("SPARK")

print(input_list)

string to list conversion

Description Convert a string input_str = 'I love Data Science & Python' to a list by splitting it on ‘&’. The sample output for this string will be: ['I love Data Science ', ' Python']

input_str = 'I love Data Science & Python'

#we will simply split the string at &

output_list = input_str.split('&')#Type your answer here

print(output_list)

List to String

Description Convert a list ['Pythons syntax is easy to learn', 'Pythons syntax is very clear'] to a string using ‘&’. The sample output of this string will be: Pythons syntax is easy to learn & Pythons syntax is very clear

Note that there is a space on both sides of '&' (as usual in English sentences).

input_str = ['Pythons syntax is easy to learn', 'Pythons syntax is very clear']

string_1 =" & ".join(input_str) #Type your answer here

print(string_1)

Nested List

Description Extract Python from a nested list input_list = [['SAS','R'],['Tableau','SQL'],['Python','Java']]

input_list = [['SAS','R'],['Tableau','SQL'],['Python','Java']]

print(input_list[2][0])

Tuple

Description Add the element ‘Python’ to a tuple input_tuple = ('Monty Python', 'British', 1969). Since tuples are immutable, one way to do this is to convert the tuple to a list, add the element, and convert it back to a tuple.

To learn how to convert a list to a tuple, search for it on Google / Stack Overflow etc.

input_tuple = ('Monty Python', 'British', 1969)

list1=list(input_tuple)

list1.append('Python')

print(tuple(list1))

Dict_Error

Description From a Dictionary input_dict={'Name': 'Monty', 'Profession': 'Singer' }, get the value of a key ‘Label’ which is not a part of the dictionary, in such a way that Python doesn't hit an error. If the key does not exist in the dictionary, Python should return 'NA'.

# Method 1

input_dict={'Name': 'Monty', 'Profession': 'Singer' }

input_dict.get('Label','NA')

# Method 2

input_dict={'Name': 'Monty', 'Profession': 'Singer' }

if('Label' in input_dict.keys()):

answer = input_dict['Label']

else:

answer='NA'

print(answer)

Getting a Value from a Dictionary.

Description Extract the company headed by Tim Cook from the dictionary {'Jack Dorsey': 'Twitter', 'Tim Cook': 'Apple','Jeff Bezos': 'Amazon','Mukesh Ambani': 'RJIO'}

input_dict={'Jack Dorsey': 'Twitter', 'Tim Cook': 'Apple','Jeff Bezos': 'Amazon','Mukesh Ambani': 'RJIO'}

name = input_dict['Tim Cook']

print(name)

List of Values in a Dictionary.

Description Create a SORTED list of all values from the dictionary input_dict = {'Jack Dorsey' : 'Twitter' , 'Tim Cook' : 'Apple','Jeff Bezos' : 'Amazon' ,'Mukesh Ambani' : 'RJIO'}

input_dict = {'Jack Dorsey' : 'Twitter' , 'Tim Cook' : 'Apple','Jeff Bezos' : 'Amazon' ,'Mukesh Ambani' : 'RJIO'}

value_list = input_dict.values()

print(sorted(value_list))

What will the output of the following set of instructions be?

d = {'Python':40, 'R':45} print(list(d.keys()))

d = {'Python':40, 'R':45}

print(list(d.keys()))

O/p : ['Python', 'R']

Set_diff

Description Find the difference, using difference and symmetric_difference, between two given lists - list1 and list2.

First, convert the lists into sets and store them as set_1 and set_2. Then store the difference and symmetric difference in answer_1 and answer_2 respectively. Print both the answers as sorted lists, i.e. convert the final sets to lists, sort it and then return it.

list_1 = [1,2,3,4,5,6]

list_2 = [2,3,4,5,6,7,8,9]

set_1 = set(list_1)

set_2 = set(list_2)

answer_1 = sorted(list(set_1.difference(set_2)))

answer_2 = sorted(list(set_1.symmetric_difference(set_2)))

print(answer_1)

print(answer_2)

o/p

[1]

[1, 7, 8, 9]

If-Else

Description Write a code to check if the string in input_str starts with a vowel or not. Print capital YES or NO.

For example, if input_str = 'analytics' then, your output should print 'YES'.

#method1

input_str="alpha"

if input_str[0] in ['a','e','i','o','u']:

print('YES')

else:

print('NO'

#Method2

input_str="alpha"

i=input_str[0]

if(i in "aeiou"):

print('YES')

else:

print('NO')

What will the following segment of code print? Try solving it verbally.

if True or True:

if False and True or False:

print('A')

elif False and False or True and True:

print('B')

else:

print('C')

else:

print('D')

O/P : B

What will the following segment of code print? Try doing this verbally.

if (10 < 0) and (0 < -10):

print("A")

elif (10 > 0) or False:

print("B")

else:

print("C")

Creating a List Comprehension

Description You are given an integer 'n' as the input. Create a list comprehension containing the squares of the integers from 1 till n^2 (including 1 and n), and print the list.

For example, if the input is 4, the output should be a list as follows:

[1, 4, 9, 16]

#Method1

n = int(input('Enter number'))

square=[i**2 for i in range(1,n+1)]

print(square)

Enter number4

[1, 4, 9, 16]

#Method2

n = int(input('Enter number'))

# Write your code here (remember to print the list)

final_list=[i**2 for i in range(1,n+1)] #remember to use range(1,n+1)

#using range(n) will give 0,1,2,... n-1

#we want 1, 2, 3, 4, ... n

print(final_list)

Function

Description Create a function squared(), which takes x and y as arguments and returns the x**y value. For e.g., if x = 2 and y = 3 , then the output is 8.

input_list = ['6','7']

x = int(input_list[0])

y = int(input_list[1])

def squared(x,y):

return(x**y)

print(squared(x,y))

Lambda

Description Create a lambda function 'greater', which takes two arguments x and y and return x if x>y otherwise y. If x = 2 and y= 3, then the output should be 3.

#Method1

input_list = [4,5]

a = int(input_list[0])

b = int(input_list[1])

greater=lambda x,y: x if x>y else y

print(greater(a,b))

#Method2

input_list = [4,5]

a = int(input_list[0])

b = int(input_list[1])

#Write your code here

def greater(a,b):

if(a>b):

return a

return b

print(greater(a,b))

Print word number of Times

def say(message, times = 1):

print(message * times)

say('Hello')

say('World', 5)

Map Function

Description Using the Map function, create a list 'cube', which consists of the cube of numbers in input_list.

For e.g. if the input list is [5,6,4,8,9], the output should be [125, 216, 64, 512, 729]

input_list = [5,6,4,8,9]

cube=list(map(lambda x: x**3, input_list))

print(cube)

Map Function

Description Using the function Map, count the number of words that start with ‘S’ in input_list.

input_list = ['San Jose', 'San Francisco', 'Santa Fe', 'Houston']

count = sum(map(lambda x: x[0] == 'S', input_list))

print(count)

Map Function

Description Create a list ‘name’ consisting of the combination of the first name and the second name from list 1 and 2 respectively.

For e.g. if the input list is: [ ['Ankur', 'Avik', 'Kiran', 'Nitin'], ['Narang', 'Sarkar', 'R', 'Sareen']]

the output list should be the list: ['Ankur Narang', 'Avik Sarkar', 'Kiran R', 'Nitin Sareen']

input_list = [['Ankur','Avik','Kiran','Nitin'],['Narang','Sarkar','R','Sareen']]

first_name = input_list[0]

last_name = input_list[1]

combine=lambda x,y:x+' '+y

name = list(map(combine,first_name,last_name))

print(name)

O/P : ['Ankur Narang', 'Avik Sarkar', 'Kiran R', 'Nitin Sareen']

Filter Function

Description You are given a list of strings such as input_list = ['hdjk', 'salsap', 'sherpa'].

Extract a list of names that start with an ‘s’ and end with a ‘p’ (both 's' and 'p' are lowercase) in input_list.

Note: Use the filter() function

input_list = ['hdjk', 'salsap', 'sherpa']

sp =list(filter(lambda x:x[0].lower()=='s' and x[-1]=='p',input_list))

print(sp)

O/P: ['salsap']

Reduce Function

Description Using the Reduce function, concatenate a list of words in input_list, and print the output as a string. If input_list = ['I','Love','Python'], the output should be the string 'I Love Python'.

input_list=['All','you','have','to','fear','is','fear','itself']

from functools import reduce

result=reduce(lambda x,y: x+" "+y, input_list)

print(result)

O/P: All you have to fear is fear itself

Reduce Function

Description You are given a list of numbers such as input_list = [31, 63, 76, 89]. Find and print the largest number in input_list using the reduce() function

input_list = [65,76,87,23,12,90,99]

from functools import reduce

answer = reduce(lambda x,y: x if x>y else y,input_list)

print(answer)

How will you extract ‘love’ from the string S = “I love Python”?

S = "I love Python"

print(S[2:6])

print(S[2:-7])

print(S[-11:-7])

love

Dictionary Iteration

What will the output be of the following code?

D = {1:['Raj', 22], 2:['Simran', 21], 3:['Rahul', 40]}

for val in D:

print(val)

O/p

1

2

3

Python Comprehensions

What will the ‘comprehension equivalent’ be for the following snippet of code?

for sentence in paragraph:

for word in sentence.split():

single_word_list.append(word)

Answer

[word for sentence in paragraph for word in sentence.split()]

Feedback :

[word for sentence in paragraph for word in sentence.split()] is the right comprehension equivalent of the code provided. You need to put it in square brackets [] since the output will be a list.

Function Arguments

What will the output of the following code be?

def my_func(*args):

return(sum(args))

print(my_func(1,2,3,4,5))

print(my_func(6,7,8))

def my_func(*args):

return(sum(args))

print(my_func(1,2,3,4,5))

print(my_func(6,7,8))

15

21

squares of all the numbers in a list L = [1, 2, 3, 4]?

L = [1, 2, 3, 4]

print(list(map(lambda x : x ** 2, L)))

[1, 4, 9, 16]

Factorial

Description Given a number ‘n’, output its factorial using reduce(). Note: Make sure you handle the edge case of zero. As you know, 0! = 1

P.S.: Finding the factorial without using the reduce() function might lead to deduction of marks.

Examples:

Input 1: 1 Output 1: 1

Input 2: 3 Output 2: 6

def factorial(n):

if (n == 0):

return 1

else:

return reduce( lambda x,y:x*y , range(1,n+1))

print(factorial(n))

#with Reduce Function

n = int(input())

fact = reduce(lambda x, y: x*y, range(1, n + 1)) if n>0 else 0 if n == 0 else 'factorial not possible'

print(fact)

#method2

# Read the input as an integer

n = int(input())

# Import the reduce() function

from functools import reduce

# If n is zero, simply print 1 as this case can't be handles by reduce()

if n==0:

print(1)

# In all other cases, use reduce() between the range 1 and (n+1). For this range,

# define a lambda function with x and y and keep multiplying them using reduce().

# This way, when the code reaches the end of the range, i.e. n, the factorial

# computation will be complete.

else:

print(reduce(lambda x, y: x * y, range(1, n+1)))

Missing Values removal

Description Count the number of missing values in each column of the dataset 'marks'.

import pandas as pd

marks = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')

print(marks.isnull().sum())

Prefix 0

Assignment 2

Tutorial 12

Midterm 16

TakeHome 9

Final 5

dtype: int64

Removing rows with missing values

Description Remove all the rows in the dataset 'marks' having 5 missing values and then print the number of missing values in each column.

import pandas as pd

marks = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')

marks=marks.dropna(thresh=2)

print(marks.isnull().sum())

#Method1

import pandas as pd

df = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')

df = df[df.isnull().sum(axis=1) != 5]

print(df.isnull().sum())

Removing extra characters from a column

Description The given data frame 'customer' has a column 'Cust_id' which has values Cust_1, Cust_2 and so on. Remove the repeated 'Cust_' from the column Cust_id so that the output column Cust_id have just numbers like 1, 2, 3 and so on. Print the first 10 rows of the dataset 'customer' after processing.

#METHOD1

import pandas as pd

customer = pd.read_csv('https://query.data.world/s/y9rxL9mGdP6AXPiDaIL4yYm6DsfTV2')

customer['Cust_id'] =customer['Cust_id'].str.replace("Cust_",'')

print(customer.head(10))

#METHOD2

import pandas as pd

customer = pd.read_csv('https://query.data.world/s/y9rxL9mGdP6AXPiDaIL4yYm6DsfTV2')

customer['Cust_id'] = customer['Cust_id'].map(lambda x: x.strip('Cust_'))

print(customer.head(10))

Customer_Name Province Region Customer_Segment Cust_id

0 MUHAMMED MACINTYRE NUNAVUT NUNAVUT SMALL BUSINESS 1

1 BARRY FRENCH NUNAVUT NUNAVUT CONSUMER 2

2 CLAY ROZENDAL NUNAVUT NUNAVUT CORPORATE 3

3 CARLOS SOLTERO NUNAVUT NUNAVUT CONSUMER 4

4 CARL JACKSON NUNAVUT NUNAVUT CORPORATE 5

Rounding decimal places of a column

Description The given dataframe 'sleepstudy' has a column 'Reaction' with floating integer values up to 4 decimal places. Round off the decimal places to 1.

# Method1

from pydataset import data

sleepstudy =data('sleepstudy')

sleepstudy['Reaction'] = sleepstudy['Reaction'].round(1)

print(sleepstudy.head(10))

#Method2

from pydataset import data

sleepstudy =data('sleepstudy')

sleepstudy['Reaction'] = sleepstudy['Reaction'].round(decimals=1)

print(sleepstudy.head(10))

Reaction Days Subject

1 249.6 0 308

2 258.7 1 308

3 250.8 2 308

Duplicated Rows

Description The given Dataframe 'rating' has repeated rows. You need to remove the duplicated rows.

import pandas as pd

rating = pd.read_csv('https://query.data.world/s/EX0EpmqwfA2UYGz1Xtd_zi4R0dQpog')

rating_update = rating.drop_duplicates()

print(rating.shape)

print(rating_update.shape)

(1254, 5)

(1149, 5)

Derived Variable

Description The given dataset 'cust_rating' has 3 columns i.e 'rating', ' food_rating', 'service_rating'. Create a new variable 'avg_rating'.

import pandas as pd

cust_rating = pd.read_csv('https://query.data.world/s/ILc-P4llUraMaYN6N6Bdw7p6kUvHnj')

cust_rating['avg_rating'] = round( (cust_rating['rating']+ cust_rating['food_rating']+ cust_rating['service_rating'])/3)

print(cust_rating.head(10))

userID placeID rating food_rating service_rating avg_rating

0 U1077 135085 2 2 2 2.0

1 U1077 135038 2 2 1 2.0

2 U1077 132825 2 2 2 2.0

Extracting Day From a Date

Description The given dataset 'order' has a variable 'Order_Date' with the dates of purchase. Create a new variable 'day' which will contain the day from the date at variable Order_Date.

import pandas as pd

order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')

order['Order_Date'] = pd.to_datetime(order['Order_Date'])

order['day'] = order['Order_Date'].dt.day

print(order.head(10))

Order_ID Order_Date Order_Priority Ord_id day

0 3 2010-10-13 LOW Ord_1 13

1 293 2012-01-10 HIGH Ord_2 10

2 483 2011-10-07 HIGH Ord_3 7

3 515 2010-08-28 NOT SPECIFIED Ord_4 28

4 613 2011-06-17 HIGH Ord_5 17

#Alternate MEthod

import pandas as pd

order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')

order['Order_Date'] = pd.to_datetime(order['Order_Date'])

order['day'] = order['Order_Date'].apply(lambda x: x.day)

print(order.head(10))

You're given a list of non-negative integers. Your task is to round the given numbers to the nearest multiple of 10. For instance, 15 should be rounded to 20 whereas 14 should be rounded to 10. After rounding the numbers, find their sum.

Hint: The Python pre-defined function round() rounds off to nearest even number - it round 0.25 to 0.2. You might want to write your own function to round as per your requirement.

Sample input (a list): [2, 18, 10]

Sample output (an integer): 30

import ast,sys

import math

input_str = sys.stdin.read()

input_list = [2, 18, 10]

# write code here

# rounds to nearest, ties away from zero

def custom_round(n, ndigits=1):

"""

Takes in any decimal number and outputs rounded number

examples:

0.25 is rounded to 0.3

0.35 is rounded to 0.4

0.21 is rounded to 0.2

"""

part = n * 10 ** ndigits

delta = part - int(part)

# round to nearest, ties away from zero

if delta >= 0.5:

part = math.ceil(part)

else:

part = math.floor(part)

return part / (10 ** ndigits)

def round_to_nearest_10(n):

""" takes in 15 and outputs 20"""

return int(100*custom_round(n/100, 1))

rounded_list = list(map(round_to_nearest_10, input_list))

result = sum(rounded_list)

# do not change the following code

print(result)

Sum and Squares You're given a natural number 'n'. First, calculate the sum of squares of all the natural numbers up to 'n'. Then calculate the square of the sum of all natural numbers up to 'n'. Return the absolute difference of these two quantities.

For instance, if n=3, then natural numbers up to 3 are: 1, 2 and 3. The sum of squares up to 3 will be 1^2 + 2^2 + 3^2 = 14. The square of the sum of natural numbers up to 3 is (1+2+3)^2=36. The result, which is their absolute difference is 22.

Sample input (an integer): 3

Sample output (an integer): 22

n = 3

def sum2(n):

s=0

for i in range(n+1):

s=s+i**2

return s

def sum1(n):

s=0

for i in range(n+1):

s=s+i

return s**2

# store the result in the following variable

abs_difference = sum1(n)-sum2(n)

# print result --- do not change the following code

print(abs_difference)

#Method 1

import ast,sys

input_str = sys.stdin.read()

n = 3

# write your code here

numbers = [number+1 for number in range(n)]

sum_of_squares = sum(list(map(lambda x: x**2, numbers)))

square_of_sum = sum(numbers)**2

# store the result in the following variable

abs_difference = abs(sum_of_squares - square_of_sum)

# print result --- do not change the following code

print(abs_difference)

def reverse(s):

str = ""

for i in s:

str = i + str

return str

s = ['ram', 'krishn','mishra']

rev=list(map(reverse,s))

print(rev)

Weird Function

In data science, quite often you need to implement research papers and write code according to what's present in those papers. Research papers have a lot of maths involved and you need to implement the maths in code. In this exercise, you're required to implement some maths in code. The problem is as follows:

For fixed integers a, b, c, define a weird function F(n) as follows: F(n) = n - c for all n > b F(n) = F(a + F(a + F(a + F(a + n)))) for all n ≤ b.

Also, define S(a, b, c) = ∑F(n) where n takes the values 0 till b [in other words, S(a, b, c) = F(0) + F(1) + F(2) + .... F(b-1) + F(b)].

The input will be the value of a, b and c. The output should be S(a, b, c). You can define the functions in your own customized way with no restrictions on the number of parameters. For example, you can define the function S which can take additional parameters than a, b and c. Just make sure the code behaves as per the maths.

For example, if a = 20, b = 100 and c = 15, then F(0) = 195 and F(2000) = 1985. Therefore, S(20, 100, 15) = 14245

import numpy as np

input_list = [20,100,15]

a = input_list[0]

b = input_list[1]

c = input_list[2]

# write code here

sum = 0

def weird_function(a,b,c,n):

if n>b:

return n-c

else:

return weird_function(a,b,c,a+weird_function(a,b,c,a+weird_function(a,b,c,a+weird_function(a,b,c,a+n))))

def large_sum(a, b, c):

large_sum = 0

for value in range(b+1):

large_sum += weird_function(a, b, c, value)

return large_sum

# store the result in the following variable

result = large_sum(a, b, c)

# print result -- do not change the following code

print(result)

Python Program to check Armstrong Number

def digit(n):

count=0

while n!=0:

rem=n%10

n=n//10

count=count+1

return count

def armstrong(n,d):

sum=0

while n!=0:

rem=n%10

sum=sum+(pow(rem,d))

n=n//10

return sum

n=int(input('Enter number'))

print(armstrong(n,digit(n)))

if(n==armstrong(n,digit(n))):

print('Number is Armstrong')

else:

print('Number is Not Armstrong')

Swap two rows

Description Given m and n, swap the mth and nth rows of the 2-D NumPy array given below.

a = [[4 3 1] [5 7 0] [9 9 3] [8 2 4]]

import numpy as np

# Given array

a = np.array([[4, 3, 1], [5, 7, 0], [9, 9, 3], [8, 2, 4]])

# Read the values of m and n

m = 0

n = 2

a[[m,n]]=a[[n,m]]

# Print the array after swapping

print(a)

Create border array Description Given a single integer n, create an (n x n) 2D array with 1 on the border and 0 on the inside.

Note: Make sure the array is of type int.

Example: Input 1: 4 Output 1: [[1 1 1 1] [1 0 0 1] [1 0 0 1] [1 1 1 1]] Input 2: 2 Output 2: [[1 1] [1 1]]

# Read the variable from STDIN

n = int(input())

import numpy as np

a=np.ones((n,n), dtype=int)

a[1:-1,1:-1] = 0

print(a)

Set Index in Dataframe

Description Using set_index command set the column 'X' as the index of the dataset and then print the head of the dataset. Hint: Use inplace = False

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df.set_index('X',inplace=False)

print(df_2.head())

Y month day FFMC DMC DC ISI temp RH wind rain area

X

7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0

7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0

Sorting Dataframes

Description Sort the dataframe on 'month' and 'day' in ascending order in the dataframe 'df'.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df.sort_values(by=['month','day'],ascending=True)

print(df_2.head(20))

X Y month day FFMC DMC DC ISI temp RH wind rain area

241 4 4 apr fri 83.0 23.3 85.3 2.3 16.7 20 3.1 0.0 0.00

442 6 5 apr mon 87.9 24.9 41.6 3.7 10.9 64 3.1 0.0 3.35

19 6 4 apr sat 86.3 27.4 97.1 5.1 9.3 44 4.5 0.0 0.00

239 7 5 apr sun 81.9 3.0 7.9 3.5 13.4 75 1.8 0.0 0.00

DataFrames

Description Given a dataframe 'df' use the following commands and analyse the result. describe() columns shape

import numpy as np

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

print(df.describe())

print(df.columns)

print(df.shape)

X Y FFMC ... wind rain area

count 517.000000 517.000000 517.000000 ... 517.000000 517.000000 517.000000

mean 4.669246 4.299807 90.644681 ... 4.017602 0.021663 12.847292

std 2.313778 1.229900 5.520111 ... 1.791653 0.295959 63.655818

min 1.000000 2.000000 18.700000 ... 0.400000 0.000000 0.000000

Indexing Dataframes

Description Print only the even numbers of rows of the dataframe 'df'.

Note: Don't include the row indexed zero.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df[2::2]

print(df_2.head(20))

X Y month day FFMC DMC DC ISI temp RH wind rain area

2 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0

4 8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0

6 8 6 aug mon 92.3 88.9 495.6 8.5 24.1 27 3.1 0.0 0.0

8 8 6 sep tue 91.0 129.5 692.6 7.0 13.1 63 5.4 0.0 0.0

10 7 5 sep sat 92.5 88.0 698.6 7.1 17.8 51 7.2 0.0 0.0

12 6 5 aug fri 63.5 70.8 665.3 0.8 17.0 72 6.7 0.0 0.0

Remove the leading spaces from the string input_str = ' This is my first code'

[ ]

1

2

3

4

5

#Remove the leading spaces from the string input_str = ' This is my first code'

# Reading the input as a string; ignore the following two lines

input_str = ' This is my first Code'

final_str = input_str.strip()

print(final_str)

This is my first Code

String Split

Description Split the string input_str = 'Kumar_Ravi_003' to the person's second name, first name and unique customer code. In this example, second_name= 'Kumar', first_name= 'Ravi', customer_code = '003'.

A sample output of the input 'Kumar_Ravi_003' is: Ravi Kumar 003

Note that you need to print in the order first name, last name and customer code.

# First Method

input_str ='Kumar_Ravi_003'

first_name = input_str[6:10]

second_name = input_str[0:5]#write your answer here

customer_code = input_str[11:14]#write your answer here

print(first_name)

print(second_name)

print(customer_code)

Ravi

Kumar

003

# Second Method

input_str ='Kumar_Ravi_003'

name = input_str.split('_')

first_name=name[0]

second_name = name[1]

customer_code = name[2]

print(second_name +" "+ first_name+" "+customer_code)

Ravi Kumar 003

input_str = 'Kumar_Ravi_003'

#first we split the string using '_'

#first name will be the 2nd element (index 1) second name will be first element(index 0)

n_list=input_str.split("_")

first_name = n_list[1]#write your answer here

second_name = n_list[0]#write your answer here

customer_code = n_list[2]#write your answer here

print(first_name)

print(second_name)

print(customer_code)

Ravi

Kumar

003

17 / 3 # classic division returns a float 17 // 3 # floor division discards the fractional part

List_remove_append

Description Remove SPSS from input_list=['SAS', 'R', 'PYTHON', 'SPSS'] and add 'SPARK' in its place.

[ ]

input_list = ['SAS', 'R', 'PYTHON', 'SPSS']

# Write code to remove 'SPSS'

input_list.remove('SPSS')

# Write code to append 'SPARK'

input_list.append("SPARK")

print(input_list)

['SAS', 'R', 'PYTHON', 'SPARK']

string to list conversion

Description Convert a string input_str = 'I love Data Science & Python' to a list by splitting it on ‘&’. The sample output for this string will be: ['I love Data Science ', ' Python']

input_str = 'I love Data Science & Python'

#we will simply split the string at &

output_list = input_str.split('&')#Type your answer here

print(output_list)

['I love Data Science ', ' Python']

List to String

Description Convert a list ['Pythons syntax is easy to learn', 'Pythons syntax is very clear'] to a string using ‘&’. The sample output of this string will be: Pythons syntax is easy to learn & Pythons syntax is very clear

Note that there is a space on both sides of '&' (as usual in English sentences).

[ ]

input_str = ['Pythons syntax is easy to learn', 'Pythons syntax is very clear']

string_1 =" & ".join(input_str) #Type your answer here

print(string_1)

Pythons syntax is easy to learn & Pythons syntax is very clear

Nested List

Description Extract Python from a nested list input_list = [['SAS','R'],['Tableau','SQL'],['Python','Java']]

input_list = [['SAS','R'],['Tableau','SQL'],['Python','Java']]

print(input_list[2][0])

Python

Tuple

Description Add the element ‘Python’ to a tuple input_tuple = ('Monty Python', 'British', 1969). Since tuples are immutable, one way to do this is to convert the tuple to a list, add the element, and convert it back to a tuple.

To learn how to convert a list to a tuple, search for it on Google / Stack Overflow etc.

input_tuple = ('Monty Python', 'British', 1969)

list1=list(input_tuple)

list1.append('Python')

print(tuple(list1))

('Monty Python', 'British', 1969, 'Python')

[ ]

Dict_Error

Description From a Dictionary input_dict={'Name': 'Monty', 'Profession': 'Singer' }, get the value of a key ‘Label’ which is not a part of the dictionary, in such a way that Python doesn't hit an error. If the key does not exist in the dictionary, Python should return 'NA'.

[ ]

# Method 1

input_dict={'Name': 'Monty', 'Profession': 'Singer' }

input_dict.get('Label','NA')

'NA'

# Method 2

input_dict={'Name': 'Monty', 'Profession': 'Singer' }

if('Label' in input_dict.keys()):

answer = input_dict['Label']

else:

answer='NA'

print(answer)

NA

Getting a Value from a Dictionary.

Description Extract the company headed by Tim Cook from the dictionary {'Jack Dorsey': 'Twitter', 'Tim Cook': 'Apple','Jeff Bezos': 'Amazon','Mukesh Ambani': 'RJIO'}

input_dict={'Jack Dorsey': 'Twitter', 'Tim Cook': 'Apple','Jeff Bezos': 'Amazon','Mukesh Ambani': 'RJIO'}

name = input_dict['Tim Cook']

print(name)

Apple

List of Values in a Dictionary.

Description Create a SORTED list of all values from the dictionary input_dict = {'Jack Dorsey' : 'Twitter' , 'Tim Cook' : 'Apple','Jeff Bezos' : 'Amazon' ,'Mukesh Ambani' : 'RJIO'}

[ ]

1

2

3

input_dict = {'Jack Dorsey' : 'Twitter' , 'Tim Cook' : 'Apple','Jeff Bezos' : 'Amazon' ,'Mukesh Ambani' : 'RJIO'}

value_list = input_dict.values()

print(sorted(value_list))

['Amazon', 'Apple', 'RJIO', 'Twitter']

What will the output of the following set of instructions be?

d = {'Python':40, 'R':45} print(list(d.keys()))

[ ]

1

2

d = {'Python':40, 'R':45}

print(list(d.keys()))

['Python', 'R']

Set_diff

Description Find the difference, using difference and symmetric_difference, between two given lists - list1 and list2.

First, convert the lists into sets and store them as set_1 and set_2. Then store the difference and symmetric difference in answer_1 and answer_2 respectively. Print both the answers as sorted lists, i.e. convert the final sets to lists, sort it and then return it.

[ ]

1

2

3

4

5

6

7

8

9

10

list_1 = [1,2,3,4,5,6]

list_2 = [2,3,4,5,6,7,8,9]

set_1 = set(list_1)

set_2 = set(list_2)

answer_1 = sorted(list(set_1.difference(set_2)))

answer_2 = sorted(list(set_1.symmetric_difference(set_2)))

print(answer_1)

print(answer_2)

[1]

[1, 7, 8, 9]

If-Else

Description Write a code to check if the string in input_str starts with a vowel or not. Print capital YES or NO.

For example, if input_str = 'analytics' then, your output should print 'YES'.

[ ]

1

2

3

4

5

6

#method1

input_str="alpha"

if input_str[0] in ['a','e','i','o','u']:

print('YES')

else:

print('NO')

YES

[ ]

1

2

3

4

5

6

7

#Method2

input_str="alpha"

i=input_str[0]

if(i in "aeiou"):

print('YES')

else:

print('NO')

YES

What will the following segment of code print? Try solving it verbally.

[ ]

1

2

3

4

5

6

7

8

9

if True or True:

if False and True or False:

print('A')

elif False and False or True and True:

print('B')

else:

print('C')

else:

print('D')

B

What will the following segment of code print? Try doing this verbally.

[ ]

1

2

3

4

5

6

if (10 < 0) and (0 < -10):

print("A")

elif (10 > 0) or False:

print("B")

else:

print("C")

B

Creating a List Comprehension

Description You are given an integer 'n' as the input. Create a list comprehension containing the squares of the integers from 1 till n^2 (including 1 and n), and print the list.

For example, if the input is 4, the output should be a list as follows:

[1, 4, 9, 16]

[ ]

1

2

3

4

5

6

#Method1

n = int(input('Enter number'))

square=[i**2 for i in range(1,n+1)]

print(square)

Enter number4

[1, 4, 9, 16]

[ ]

1

2

3

4

5

6

7

8

9

10

#Method2

n = int(input('Enter number'))

# Write your code here (remember to print the list)

final_list=[i**2 for i in range(1,n+1)] #remember to use range(1,n+1)

#using range(n) will give 0,1,2,... n-1

#we want 1, 2, 3, 4, ... n

print(final_list)

Enter number5

[1, 4, 9, 16, 25]

Function

Description Create a function squared(), which takes x and y as arguments and returns the x**y value. For e.g., if x = 2 and y = 3 , then the output is 8.

[ ]

1

2

3

4

5

6

7

8

input_list = ['6','7']

x = int(input_list[0])

y = int(input_list[1])

def squared(x,y):

return(x**y)

print(squared(x,y))

279936

Lambda

Description Create a lambda function 'greater', which takes two arguments x and y and return x if x>y otherwise y. If x = 2 and y= 3, then the output should be 3.

[ ]

1

2

3

4

5

6

#Method1

input_list = [4,5]

a = int(input_list[0])

b = int(input_list[1])

greater=lambda x,y: x if x>y else y

print(greater(a,b))

5

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

#Method2

input_list = [4,5]

a = int(input_list[0])

b = int(input_list[1])

#Write your code here

def greater(a,b):

if(a>b):

return a

return b

print(greater(a,b))

5

[ ]

1

2

3

4

5

def say(message, times = 1):

print(message * times)

say('Hello')

say('World', 5)

Hello

WorldWorldWorldWorldWorld

Map Function

Description Using the Map function, create a list 'cube', which consists of the cube of numbers in input_list.

For e.g. if the input list is [5,6,4,8,9], the output should be [125, 216, 64, 512, 729]

[ ]

1

2

3

input_list = [5,6,4,8,9]

cube=list(map(lambda x: x**3, input_list))

print(cube)

[125, 216, 64, 512, 729]

Map Function

Description Using the function Map, count the number of words that start with ‘S’ in input_list.

[ ]

1

2

3

input_list = ['San Jose', 'San Francisco', 'Santa Fe', 'Houston']

count = sum(map(lambda x: x[0] == 'S', input_list))

print(count)

3

Map Function

Description Create a list ‘name’ consisting of the combination of the first name and the second name from list 1 and 2 respectively.

For e.g. if the input list is: [ ['Ankur', 'Avik', 'Kiran', 'Nitin'], ['Narang', 'Sarkar', 'R', 'Sareen']]

the output list should be the list: ['Ankur Narang', 'Avik Sarkar', 'Kiran R', 'Nitin Sareen']

[ ]

1

2

3

4

5

6

input_list = [['Ankur','Avik','Kiran','Nitin'],['Narang','Sarkar','R','Sareen']]

first_name = input_list[0]

last_name = input_list[1]

combine=lambda x,y:x+' '+y

name = list(map(combine,first_name,last_name))

print(name)

['Ankur Narang', 'Avik Sarkar', 'Kiran R', 'Nitin Sareen']

Filter Function

Description You are given a list of strings such as input_list = ['hdjk', 'salsap', 'sherpa'].

Extract a list of names that start with an ‘s’ and end with a ‘p’ (both 's' and 'p' are lowercase) in input_list.

Note: Use the filter() function.

[ ]

1

2

3

input_list = ['hdjk', 'salsap', 'sherpa']

sp =list(filter(lambda x:x[0].lower()=='s' and x[-1]=='p',input_list))

print(sp)

['salsap']

Reduce Function

Description Using the Reduce function, concatenate a list of words in input_list, and print the output as a string. If input_list = ['I','Love','Python'], the output should be the string 'I Love Python'.

[ ]

1

2

3

4

input_list=['All','you','have','to','fear','is','fear','itself']

from functools import reduce

result=reduce(lambda x,y: x+" "+y, input_list)

print(result)

All you have to fear is fear itself

Reduce Function

Description You are given a list of numbers such as input_list = [31, 63, 76, 89]. Find and print the largest number in input_list using the reduce() function.

[ ]

1

2

3

4

input_list = [65,76,87,23,12,90,99]

from functools import reduce

answer = reduce(lambda x,y: x if x>y else y,input_list)

print(answer)

99

How will you extract ‘love’ from the string S = “I love Python”?

[ ]

1

2

3

4

S = "I love Python"

print(S[2:6])

print(S[2:-7])

print(S[-11:-7])

love

Dictionary Iteration

What will the output be of the following code?

[ ]

1

2

3

D = {1:['Raj', 22], 2:['Simran', 21], 3:['Rahul', 40]}

for val in D:

print(val)

1

2

3

Python Comprehensions

What will the ‘comprehension equivalent’ be for the following snippet of code?

[ ]

1

2

3

4

for sentence in paragraph:

for word in sentence.split():

single_word_list.append(word)

Answer [word for sentence in paragraph for word in sentence.split()]

Feedback : [word for sentence in paragraph for word in sentence.split()] is the right comprehension equivalent of the code provided. You need to put it in square brackets [] since the output will be a list.

Function Arguments

What will the output of the following code be?

def my_func(*args):

return(sum(args))

print(my_func(1,2,3,4,5))

print(my_func(6,7,8))

[ ]

1

2

3

4

5

def my_func(*args):

return(sum(args))

print(my_func(1,2,3,4,5))

print(my_func(6,7,8))

15

21

squares of all the numbers in a list L = [1, 2, 3, 4]?

[ ]

1

2

L = [1, 2, 3, 4]

print(list(map(lambda x : x ** 2, L)))

[1, 4, 9, 16]

Factorial

Description Given a number ‘n’, output its factorial using reduce(). Note: Make sure you handle the edge case of zero. As you know, 0! = 1

P.S.: Finding the factorial without using the reduce() function might lead to deduction of marks.

Examples:

Input 1: 1 Output 1: 1

Input 2: 3 Output 2: 6

[ ]

1

2

3

4

5

6

def factorial(n):

if (n == 0):

return 1

else:

return reduce( lambda x,y:x*y , range(1,n+1))

print(factorial(n))

120

[ ]

1

2

3

n = int(input())

fact = reduce(lambda x, y: x*y, range(1, n + 1)) if n>0 else 0 if n == 0 else 'factorial not possible'

print(fact)

5

120

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

# Read the input as an integer

n = int(input())

# Import the reduce() function

from functools import reduce

# If n is zero, simply print 1 as this case can't be handles by reduce()

if n==0:

print(1)

# In all other cases, use reduce() between the range 1 and (n+1). For this range,

# define a lambda function with x and y and keep multiplying them using reduce().

# This way, when the code reaches the end of the range, i.e. n, the factorial

# computation will be complete.

else:

print(reduce(lambda x, y: x * y, range(1, n+1)))

7

Set Operations

Description In a school, there are total 20 students numbered from 1 to 20. You’re given three lists named ‘C’, ‘F’, and ‘H’, representing students who play cricket, football, and hockey, respectively. Based on this information, find out and print the following: Students who play all the three sports Students who play both cricket and football but don’t play hockey Students who play exactly two of the sports Students who don’t play any of the three sports Format: Input: 3 lists containing numbers (ranging from 1 to 20) representing students who play cricket, football and hockey respectively. Output: 4 different lists containing the students according to the constraints provided in the questions.

Note: Make sure you sort the final lists (in an ascending order) that you get before printing them; otherwise your answer might not match the test-cases.

Examples: Input 1: [2, 5, 9, 12, 13, 15, 16, 17, 18, 19] [2, 4, 5, 6, 7, 9, 13, 16] [1, 2, 5, 9, 10, 11, 12, 13, 15] Output 1: [2, 5, 9, 13] [16] [12, 15, 16] [3, 8, 14, 20]

Explanation: 1.Given the three sets, you can see that the students numbered '2', '5', '9', and '13' play all the three sports.

The student numbered '16' plays cricket and football but doesn't play hockey.
The student numbered '12' and '15' plays cricket and hockey and the student numbered '16' plays cricket and football. There are no students who play only football and hockey. Hence, the students who play exactly two sports are 12, 15, and 16.
As you can see, the students who play none of the sports are 3, 8, 14, and 20.

[ ]

1

2

3

4

import pandas as pd

rain=pd.read_csv(input1)

rain_g=rain.pivot_table(values='rain', index='COUNTRY', aggfunc='mean')

print(rain_g)

[ ]

1

Missing Values removal

Description Count the number of missing values in each column of the dataset 'marks'.

[ ]

1

2

3

import pandas as pd

marks = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')

print(marks.isnull().sum())

Prefix 0

Assignment 2

Tutorial 12

Midterm 16

TakeHome 9

Final 5

dtype: int64

Removing rows with missing values

Description Remove all the rows in the dataset 'marks' having 5 missing values and then print the number of missing values in each column.

[ ]

1

2

3

4

import pandas as pd

marks = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')

marks=marks.dropna(thresh=2)

print(marks.isnull().sum())

Prefix 0

Assignment 0

Tutorial 10

Midterm 14

TakeHome 7

Final 3

dtype: int64

import pandas as pd

df = pd.read_csv('https://query.data.world/s/HqjNNadqEnwSq1qnoV_JqyRJkc7o6O')

df = df[df.isnull().sum(axis=1) != 5]

print(df.isnull().sum())

Prefix 0

Assignment 0

Tutorial 10

Midterm 14

TakeHome 7

Final 3

dtype: int64

Removing extra characters from a column

Description The given data frame 'customer' has a column 'Cust_id' which has values Cust_1, Cust_2 and so on. Remove the repeated 'Cust_' from the column Cust_id so that the output column Cust_id have just numbers like 1, 2, 3 and so on. Print the first 10 rows of the dataset 'customer' after processing.

[ ]

#METHOD1

import pandas as pd

customer = pd.read_csv('https://query.data.world/s/y9rxL9mGdP6AXPiDaIL4yYm6DsfTV2')

customer['Cust_id'] =customer['Cust_id'].str.replace("Cust_",'')

print(customer.head(10))

Customer_Name Province Region Customer_Segment Cust_id

0 MUHAMMED MACINTYRE NUNAVUT NUNAVUT SMALL BUSINESS 1

1 BARRY FRENCH NUNAVUT NUNAVUT CONSUMER 2

2 CLAY ROZENDAL NUNAVUT NUNAVUT CORPORATE 3

3 CARLOS SOLTERO NUNAVUT NUNAVUT CONSUMER 4

4 CARL JACKSON NUNAVUT NUNAVUT CORPORATE 5

5 MONICA FEDERLE NUNAVUT NUNAVUT CORPORATE 6

6 DOROTHY BADDERS NUNAVUT NUNAVUT HOME OFFICE 7

7 NEOLA SCHNEIDER NUNAVUT NUNAVUT HOME OFFICE 8

8 CARLOS DALY NUNAVUT NUNAVUT HOME OFFICE 9

9 CLAUDIA MINER NUNAVUT NUNAVUT SMALL BUSINESS 10

[ ]

#METHOD2

import pandas as pd

customer = pd.read_csv('https://query.data.world/s/y9rxL9mGdP6AXPiDaIL4yYm6DsfTV2')

customer['Cust_id'] = customer['Cust_id'].map(lambda x: x.strip('Cust_'))

print(customer.head(10))

Customer_Name Province Region Customer_Segment Cust_id

0 MUHAMMED MACINTYRE NUNAVUT NUNAVUT SMALL BUSINESS 1

1 BARRY FRENCH NUNAVUT NUNAVUT CONSUMER 2

2 CLAY ROZENDAL NUNAVUT NUNAVUT CORPORATE 3

3 CARLOS SOLTERO NUNAVUT NUNAVUT CONSUMER 4

4 CARL JACKSON NUNAVUT NUNAVUT CORPORATE 5

5 MONICA FEDERLE NUNAVUT NUNAVUT CORPORATE 6

6 DOROTHY BADDERS NUNAVUT NUNAVUT HOME OFFICE 7

7 NEOLA SCHNEIDER NUNAVUT NUNAVUT HOME OFFICE 8

8 CARLOS DALY NUNAVUT NUNAVUT HOME OFFICE 9

9 CLAUDIA MINER NUNAVUT NUNAVUT SMALL BUSINESS 10

Rounding decimal places of a column

Description The given dataframe 'sleepstudy' has a column 'Reaction' with floating integer values up to 4 decimal places. Round off the decimal places to 1.

[ ]

# Method1

from pydataset import data

sleepstudy =data('sleepstudy')

sleepstudy['Reaction'] = sleepstudy['Reaction'].round(1)

print(sleepstudy.head(10))

Reaction Days Subject

1 249.6 0 308

2 258.7 1 308

3 250.8 2 308

4 321.4 3 308

5 356.9 4 308

6 414.7 5 308

7 382.2 6 308

8 290.1 7 308

9 430.6 8 308

10 466.4 9 308

#Method2

from pydataset import data

sleepstudy =data('sleepstudy')

sleepstudy['Reaction'] = sleepstudy['Reaction'].round(decimals=1)

print(sleepstudy.head(10))

Reaction Days Subject

1 249.6 0 308

2 258.7 1 308

3 250.8 2 308

4 321.4 3 308

5 356.9 4 308

6 414.7 5 308

7 382.2 6 308

8 290.1 7 308

9 430.6 8 308

10 466.4 9 308

Duplicated Rows

Description The given Dataframe 'rating' has repeated rows. You need to remove the duplicated rows.

[ ]

1

2

3

4

5

6

import pandas as pd

rating = pd.read_csv('https://query.data.world/s/EX0EpmqwfA2UYGz1Xtd_zi4R0dQpog')

rating_update = rating.drop_duplicates()

print(rating.shape)

print(rating_update.shape)

(1254, 5)

(1149, 5)

Derived Variable

Description The given dataset 'cust_rating' has 3 columns i.e 'rating', ' food_rating', 'service_rating'. Create a new variable 'avg_rating'.

[ ]

1

2

3

4

5

6

7

import pandas as pd

cust_rating = pd.read_csv('https://query.data.world/s/ILc-P4llUraMaYN6N6Bdw7p6kUvHnj')

cust_rating['avg_rating'] = round( (cust_rating['rating']+ cust_rating['food_rating']+ cust_rating['service_rating'])/3)

print(cust_rating.head(10))

userID placeID rating food_rating service_rating avg_rating

0 U1077 135085 2 2 2 2.0

1 U1077 135038 2 2 1 2.0

2 U1077 132825 2 2 2 2.0

3 U1077 135060 1 2 2 2.0

4 U1068 135104 1 1 2 1.0

5 U1068 132740 0 0 0 0.0

6 U1068 132663 1 1 1 1.0

7 U1068 132732 0 0 0 0.0

8 U1068 132630 1 1 1 1.0

9 U1067 132584 2 2 2 2.0

Extracting Day From a Date

Description The given dataset 'order' has a variable 'Order_Date' with the dates of purchase. Create a new variable 'day' which will contain the day from the date at variable Order_Date.

[ ]

1

2

3

4

5

6

7

import pandas as pd

order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')

order['Order_Date'] = pd.to_datetime(order['Order_Date'])

order['day'] = order['Order_Date'].dt.day

print(order.head(10))

Order_ID Order_Date Order_Priority Ord_id day

0 3 2010-10-13 LOW Ord_1 13

1 293 2012-01-10 HIGH Ord_2 10

2 483 2011-10-07 HIGH Ord_3 7

3 515 2010-08-28 NOT SPECIFIED Ord_4 28

4 613 2011-06-17 HIGH Ord_5 17

5 643 2011-03-24 HIGH Ord_6 24

6 678 2010-02-26 LOW Ord_7 26

7 807 2010-11-23 MEDIUM Ord_8 23

8 868 2012-08-06 NOT SPECIFIED Ord_9 6

9 933 2012-04-08 NOT SPECIFIED Ord_10 8

[ ]

1

2

3

4

5

6

7

import pandas as pd

order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')

order['Order_Date'] = pd.to_datetime(order['Order_Date'])

order['day'] = order['Order_Date'].apply(lambda x: x.day)

print(order.head(10))

Order_ID Order_Date Order_Priority Ord_id day

0 3 2010-10-13 LOW Ord_1 13

1 293 2012-01-10 HIGH Ord_2 10

2 483 2011-10-07 HIGH Ord_3 7

3 515 2010-08-28 NOT SPECIFIED Ord_4 28

4 613 2011-06-17 HIGH Ord_5 17

5 643 2011-03-24 HIGH Ord_6 24

6 678 2010-02-26 LOW Ord_7 26

7 807 2010-11-23 MEDIUM Ord_8 23

8 868 2012-08-06 NOT SPECIFIED Ord_9 6

9 933 2012-04-08 NOT SPECIFIED Ord_10 8

Python Program for factorial of a number

[ ]

1

2

3

4

5

6

7

def fact(x):

r=1;

if (x==0):

return 1

else:

return(x*fact(x-1))

print(fact(5))

120

[ ]

1

2

3

4

5

6

def fact(n):

return 1 if (n==1 or n==0) else n * fact(n - 1);

n = 5;

print("Factorial of "+str(n)+" =",

fact(n))

Factorial of 5 = 120

find the sum of squeare seris

[ ]

↳ 3 cells hidden

Separate Letters from String

[ ]

↳ 1 cell hidden

Python program to convert time from 12 hour to 24 hour format

[ ]

↳ 2 cells hidden

Ordered and Unordered Categorical Variables

Categorical variables can be of two types - ordered categorical and unordered categorical. In unordered, it is not possible to say that a certain category is 'more or less' or 'higher or lower' than others. For example, color is such a variable (red is not greater or more than green etc.)

On the other hand, ordered categories have a notion of 'higher-lower', 'before-after', 'more-less' etc. For e.g. the age-group variable having three values - child, adult and old is ordered categorical because an old person is 'more aged' than an adult etc. In general, it is possible to define some kind of ordering.

The months in a year - Jan, Feb, March etc. Feedback : Months have an element of ordering - Jan comes before April, Dec comes after everything else etc. In general, all dates are ordered categorical variables (day 23 comes after day 11 of the month etc.)

Unordered Categorical Variables - Univariate Analysis

You have worked with some unordered categorical variables in the past, for example:

The Prodcut_Category in the retail sales dataset

The Customer_Segment in the retail sales dataset

The name of a batsman in any of the cricket datasets

Now imagine someone (say a client) gives you only an unordered categorical variable (and nothing else!), such as a column of size 4000 named 'country_of_person' with 130 unique countries and asks you 'can you extract anything useful from just this one variable?'.

Write down what how you would analyse just that variable to get something meaning out of it. Note that you have only one column to analyse.

The only thing you can do with an unordered categorical variable is to count the frequency of each category in the column. For example, you could observe that the product category 'furniture' appears 1000 times, 'technology' appears 810 times and so on.

Ordered Categorical Variables

You have already worked with ordered categorical variables before - there is a certain order or notion of 'high-low', 'before-after' etc. among the categories. For e.g. days of the week (Monday comes before Tuesday), grades of students (A is better than B), number of overs bowled by a bowler (3, 4, 9) etc.

Which of the following are other examples of ordered categorical variables? Choose all the correct options.

Dates in a year e.g. Jan 2, Mar 15 etc. Feedback : Dates are ordered - each day comes before or after other days. Correct

Star rating of a restaurant on Zomato on a scale of 1-5 Feedback : A rating of 5 is better than 4, 3, 2, 1.

Numeric and Ordered Categorical Variables

Anand mentioned that you can treat numeric variables as ordered categorical variables. For analysis, you can deliberately convert numeric variables into ordered categorical, for example, if you have incomes of a few thousand people ranging from

5,000to

100,000, you can categorise them into bins such as [5000, 10000], [10000,15000] and [15000, 20000].

This is called 'binning'.

Which of the following variables can be binned into ordered categorical variables? Mark all the correct options.

The temperature in a city over a certain time period Feedback : You can bin the temperatures as [0, 10 degrees], [10, 20 degrees] etc. Correct

The revenue generated per day of a company Feedback : This can also be binned e.g. [0, 10k], [10k, 20k] etc.

Extracting Day From a Date

Description The given dataset 'order' has a variable 'Order_Date' with the dates of purchase. Create a new variable 'day' which will contain the day from the date at variable Order_Date.

[ ]

1

2

3

4

5

6

7

8

#Method1

import pandas as pd

order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')

order['Order_Date'] = pd.to_datetime(order['Order_Date'])

order['day'] = order['Order_Date'].dt.day

print(order.head(10))

Order_ID Order_Date Order_Priority Ord_id day

0 3 2010-10-13 LOW Ord_1 13

1 293 2012-01-10 HIGH Ord_2 10

2 483 2011-10-07 HIGH Ord_3 7

3 515 2010-08-28 NOT SPECIFIED Ord_4 28

4 613 2011-06-17 HIGH Ord_5 17

5 643 2011-03-24 HIGH Ord_6 24

6 678 2010-02-26 LOW Ord_7 26

7 807 2010-11-23 MEDIUM Ord_8 23

8 868 2012-08-06 NOT SPECIFIED Ord_9 6

9 933 2012-04-08 NOT SPECIFIED Ord_10 8

[ ]

1

2

3

4

5

6

7

8

#Method2

import pandas as pd

order = pd.read_csv('https://query.data.world/s/3hIAtsCE7vYkPEL-O5DyWJAeS5Af-7')

order['Order_Date'] = pd.to_datetime(order['Order_Date'])

order['day'] = order['Order_Date'].apply(lambda x: x.day)

print(order.head(10))

Order_ID Order_Date Order_Priority Ord_id day

0 3 2010-10-13 LOW Ord_1 13

1 293 2012-01-10 HIGH Ord_2 10

2 483 2011-10-07 HIGH Ord_3 7

3 515 2010-08-28 NOT SPECIFIED Ord_4 28

4 613 2011-06-17 HIGH Ord_5 17

5 643 2011-03-24 HIGH Ord_6 24

6 678 2010-02-26 LOW Ord_7 26

7 807 2010-11-23 MEDIUM Ord_8 23

8 868 2012-08-06 NOT SPECIFIED Ord_9 6

9 933 2012-04-08 NOT SPECIFIED Ord_10 8

[ ]

1

2

3

4

5

import numpy as np

np.random.seed(1234)

df = pd.DataFrame(np.random.randn(10, 4),

columns=['Col1', 'Col2', 'Col3', 'Col4'])

boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3','Col4'])

[ ]

1

2

3

4

5

6

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

# Put dataset on my github repo

df = pd.read_csv('https://raw.githubusercontent.com/mGalarnyk/Python_Tutorials/master/Kaggle/BreastCancerWisconsin/data/data.csv')

df.head(5)

[ ]

1

sns.boxplot(x='diagnosis', y='area_mean', data=df)

[ ]

1

Alarm Clock You're trying to automate your alarm clock by writing a function for it. You're given a day of the week encoded as 1=Mon, 2=Tue, ... 6=Sat, 7=Sun, and a boolean value (a boolean object is either True or False. Google "booleans python" to get a better understanding) indicating if you're are on vacation. Based on the day and whether you're on vacation, write a function that returns a time in form of a string indicating when the alarm clock should ring.

When not on a vacation, on weekdays, the alarm should ring at "7:00" and on the weekends (Saturday and Sunday) it should ring at "10:00".

While on a vacation, it should ring at "10:00" on weekdays. On vacation, it should not ring on weekends, that is, it should return "off".

Sample input (a list): [7,True]

Sample output (a string):

off

Sample input (a list): [3,True]

Sample output (a string):

10:00

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

import ast,sys

input_list = [1,False]

day_of_the_week = input_list[0]

is_on_vacation = input_list[1]

# write your code here

def alarm_clock(day, vacation):

if(str(day) in '12345' and vacation==True):

return('10:00')

elif(str(day) in '12345' and vacation==False):

return('7:00')

elif(str(day) in '67' and vacation==False):

return('off')

else:

return 'off'

# do not change the following code

time = alarm_clock(day_of_the_week, is_on_vacation)

print(time.lower())

7:00

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

import ast,sys

input_str = sys.stdin.read()

input_list = [1,False]

day_of_the_week = int(input_list[0])

is_on_vacation = input_list[1]

# write your code here

def alarm_clock(day, vacation):

weekends = [6, 7]

if vacation and day not in weekends:

return "10:00"

elif vacation and day in weekends:

return "off"

elif vacation == False and day not in weekends:

return "7:00"

elif vacation == False and day in weekends:

return "10:00"

# do not change the following code

time = alarm_clock(day_of_the_week, is_on_vacation)

print(time.lower())

7:00

You're given a list of non-negative integers. Your task is to round the given numbers to the nearest multiple of 10. For instance, 15 should be rounded to 20 whereas 14 should be rounded to 10. After rounding the numbers, find their sum.

Hint: The Python pre-defined function round() rounds off to nearest even number - it round 0.25 to 0.2. You might want to write your own function to round as per your requirement.

Sample input (a list): [2, 18, 10]

Sample output (an integer): 30

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

import ast,sys

import math

input_str = sys.stdin.read()

input_list = [2, 18, 10]

# write code here

# rounds to nearest, ties away from zero

def custom_round(n, ndigits=1):

"""

Takes in any decimal number and outputs rounded number

examples:

0.25 is rounded to 0.3

0.35 is rounded to 0.4

0.21 is rounded to 0.2

"""

part = n * 10 ** ndigits

delta = part - int(part)

# round to nearest, ties away from zero

if delta >= 0.5:

part = math.ceil(part)

else:

part = math.floor(part)

return part / (10 ** ndigits)

def round_to_nearest_10(n):

""" takes in 15 and outputs 20"""

return int(100*custom_round(n/100, 1))

rounded_list = list(map(round_to_nearest_10, input_list))

result = sum(rounded_list)

# do not change the following code

print(result)

30

Sum and Squares You're given a natural number 'n'. First, calculate the sum of squares of all the natural numbers up to 'n'. Then calculate the square of the sum of all natural numbers up to 'n'. Return the absolute difference of these two quantities.

For instance, if n=3, then natural numbers up to 3 are: 1, 2 and 3. The sum of squares up to 3 will be 1^2 + 2^2 + 3^2 = 14. The square of the sum of natural numbers up to 3 is (1+2+3)^2=36. The result, which is their absolute difference is 22.

Sample input (an integer): 3

Sample output (an integer): 22

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

n = 3

def sum2(n):

s=0

for i in range(n+1):

s=s+i**2

return s

def sum1(n):

s=0

for i in range(n+1):

s=s+i

return s**2

# store the result in the following variable

abs_difference = sum1(n)-sum2(n)

# print result --- do not change the following code

print(abs_difference)

22

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

import ast,sys

input_str = sys.stdin.read()

n = 3

# write your code here

numbers = [number+1 for number in range(n)]

sum_of_squares = sum(list(map(lambda x: x**2, numbers)))

square_of_sum = sum(numbers)**2

# store the result in the following variable

abs_difference = abs(sum_of_squares - square_of_sum)

# print result --- do not change the following code

print(abs_difference)

22

[ ]

1

2

3

4

5

6

7

8

9

10

def reverse(s):

str = ""

for i in s:

str = i + str

return str

s = ['ram', 'krishn','mishra']

rev=list(map(reverse,s))

print(rev)

['mar', 'nhsirk', 'arhsim']

Weird Function

In data science, quite often you need to implement research papers and write code according to what's present in those papers. Research papers have a lot of maths involved and you need to implement the maths in code. In this exercise, you're required to implement some maths in code. The problem is as follows:

For fixed integers a, b, c, define a weird function F(n) as follows: F(n) = n - c for all n > b F(n) = F(a + F(a + F(a + F(a + n)))) for all n ≤ b.

Also, define S(a, b, c) = ∑F(n) where n takes the values 0 till b [in other words, S(a, b, c) = F(0) + F(1) + F(2) + .... F(b-1) + F(b)].

The input will be the value of a, b and c. The output should be S(a, b, c). You can define the functions in your own customized way with no restrictions on the number of parameters. For example, you can define the function S which can take additional parameters than a, b and c. Just make sure the code behaves as per the maths.

For example, if a = 20, b = 100 and c = 15, then F(0) = 195 and F(2000) = 1985. Therefore, S(20, 100, 15) = 14245

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

import numpy as np

input_list = [20,100,15]

a = input_list[0]

b = input_list[1]

c = input_list[2]

# write code here

sum = 0

def weird_function(a,b,c,n):

if n>b:

return n-c

else:

return weird_function(a,b,c,a+weird_function(a,b,c,a+weird_function(a,b,c,a+weird_function(a,b,c,a+n))))

def large_sum(a, b, c):

large_sum = 0

for value in range(b+1):

large_sum += weird_function(a, b, c, value)

return large_sum

# store the result in the following variable

result = large_sum(a, b, c)

# print result -- do not change the following code

print(result)

14245

Python Program to check Armstrong Number

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

def digit(n):

count=0

while n!=0:

rem=n%10

n=n//10

count=count+1

return count

def armstrong(n,d):

sum=0

while n!=0:

rem=n%10

sum=sum+(pow(rem,d))

n=n//10

return sum

n=int(input('Enter number'))

print(armstrong(n,digit(n)))

if(n==armstrong(n,digit(n))):

print('Number is Armstrong')

else:

print('Number is Not Armstrong')

Enter number1234

354

Number is Not Armstrong

Swap two rows

Description Given m and n, swap the mth and nth rows of the 2-D NumPy array given below.

a = [[4 3 1] [5 7 0] [9 9 3] [8 2 4]]

[ ]

1

2

3

4

5

6

7

8

9

10

11

12

13

import numpy as np

# Given array

a = np.array([[4, 3, 1], [5, 7, 0], [9, 9, 3], [8, 2, 4]])

# Read the values of m and n

m = 0

n = 2

a[[m,n]]=a[[n,m]]

# Print the array after swapping

print(a)

[[9 9 3]

[5 7 0]

[4 3 1]

[8 2 4]]

Create border array Description Given a single integer n, create an (n x n) 2D array with 1 on the border and 0 on the inside.

Note: Make sure the array is of type int.

Example: Input 1: 4 Output 1: [[1 1 1 1] [1 0 0 1] [1 0 0 1] [1 1 1 1]] Input 2: 2 Output 2: [[1 1] [1 1]]

[ ]

1

2

3

4

5

6

# Read the variable from STDIN

n = int(input())

import numpy as np

a=np.ones((n,n), dtype=int)

a[1:-1,1:-1] = 0

print(a)

3

[[1 1 1]

[1 0 1]

[1 1 1]]

Set Index in Dataframe

Description Using set_index command set the column 'X' as the index of the dataset and then print the head of the dataset. Hint: Use inplace = False

[ ]

1

2

3

4

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df.set_index('X',inplace=False)

print(df_2.head())

Y month day FFMC DMC DC ISI temp RH wind rain area

X

7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0

7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0

7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0

8 6 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 0.0

8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0

Sorting Dataframes

Description Sort the dataframe on 'month' and 'day' in ascending order in the dataframe 'df'.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df.sort_values(by=['month','day'],ascending=True)

print(df_2.head(20))

X Y month day FFMC DMC DC ISI temp RH wind rain area

241 4 4 apr fri 83.0 23.3 85.3 2.3 16.7 20 3.1 0.0 0.00

442 6 5 apr mon 87.9 24.9 41.6 3.7 10.9 64 3.1 0.0 3.35

19 6 4 apr sat 86.3 27.4 97.1 5.1 9.3 44 4.5 0.0 0.00

239 7 5 apr sun 81.9 3.0 7.9 3.5 13.4 75 1.8 0.0 0.00

469 6 3 apr sun 91.0 14.6 25.6 12.3 13.7 33 9.4 0.0 61.13

470 5 4 apr sun 91.0 14.6 25.6 12.3 17.6 27 5.8 0.0 0.00

176 6 5 apr thu 81.5 9.1 55.2 2.7 5.8 54 5.8 0.0 4.61

196 6 5 apr thu 81.5 9.1 55.2 2.7 5.8 54 5.8 0.0 10.93

240 6 3 apr wed 88.0 17.2 43.5 3.8 15.2 51 2.7 0.0 0.00

12 6 5 aug fri 63.5 70.8 665.3 0.8 17.0 72 6.7 0.0 0.00

78 1 2 aug fri 90.1 108.0 529.8 12.5 14.7 66 2.7 0.0 0.00

142 8 6 aug fri 90.1 108.0 529.8 12.5 21.2 51 8.9 0.0 0.61

184 8 6 aug fri 93.9 135.7 586.7 15.1 20.8 34 4.9 0.0 6.96

DataFrames

Description Given a dataframe 'df' use the following commands and analyse the result. describe() columns shape

import numpy as np

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

print(df.describe())

print(df.columns)

print(df.shape)

X Y FFMC ... wind rain area

count 517.000000 517.000000 517.000000 ... 517.000000 517.000000 517.000000

mean 4.669246 4.299807 90.644681 ... 4.017602 0.021663 12.847292

std 2.313778 1.229900 5.520111 ... 1.791653 0.295959 63.655818

min 1.000000 2.000000 18.700000 ... 0.400000 0.000000 0.000000

25% 3.000000 4.000000 90.200000 ... 2.700000 0.000000 0.000000

50% 4.000000 4.000000 91.600000 ... 4.000000 0.000000 0.520000

75% 7.000000 5.000000 92.900000 ... 4.900000 0.000000 6.570000

max 9.000000 9.000000 96.200000 ... 9.400000 6.400000 1090.840000

[8 rows x 11 columns]

Index(['X', 'Y', 'month', 'day', 'FFMC', 'DMC', 'DC', 'ISI', 'temp', 'RH',

'wind', 'rain', 'area'],

dtype='object')

(517, 13)

Indexing Dataframes

Description Print only the even numbers of rows of the dataframe 'df'.

Note: Don't include the row indexed zero.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df[2::2]

print(df_2.head(20))

X Y month day FFMC DMC DC ISI temp RH wind rain area

2 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0

4 8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0

6 8 6 aug mon 92.3 88.9 495.6 8.5 24.1 27 3.1 0.0 0.0

Selecting Columns of a Dataframe

Description Print out the columns 'month', 'day', 'temp', 'area' from the dataframe 'df'.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df[['month','day','temp','area']]

print(df_2.head(20))

month day temp area

0 mar fri 8.2 0.0

1 oct tue 18.0 0.0

2 oct sat 14.6 0.0

3 mar fri 8.3 0.0

Dataframe iloc

Description Using iloc index the dataframe to print all the rows of the columns at index 3,4,5. Hint: Use 3,4,5 not 2,3,4

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df.iloc[:,[3,4,5]]

print(df_2.head(5))

day FFMC DMC

0 fri 86.2 26.2

1 tue 90.6 35.4

2 sat 90.6 43.7

Dataframes loc

Description Using loc function print out all the columns and rows from 2 to 20 of the 'df' dataset.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df.loc[2:5,:]

print(df_2)

X Y month day FFMC DMC DC ISI temp RH wind rain area

2 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0

3 8 6 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 0.0

4 8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0

Applying Conditions on Dataframes

Description Print all the columns and the rows where 'area' is greater than 0, 'wind' is greater than 1 and the 'temp' is greater than 15.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_2 = df.loc[(df.area>0)&(df.wind>1)&(df.temp>15),:]

print(df_2.head(5))

X Y month day FFMC DMC DC ISI temp RH wind rain area

138 9 9 jul tue 85.8 48.3 313.4 3.9 18.0 42 2.7 0.0 0.36

139 1 4 sep tue 91.0 129.5 692.6 7.0 21.7 38 2.2 0.0 0.43

140 2 5 sep mon 90.9 126.5 686.5 7.0 21.9 39 1.8 0.0 0.47

Dataframes Merge

Description Perform an inner merge on two data frames df_1 and df_2 on 'unique_id' and print the combined dataframe.

import pandas as pd

df_1 = pd.read_csv('https://query.data.world/s/vv3snq28bp0TJq2ggCdxGOghEQKPZo')

df_2 = pd.read_csv('https://query.data.world/s/9wVKjNT0yiRc3YbVJaiI8a6HGl2d74')

df_3 = pd.merge(df_1, df_2, how='inner', on='unique_id')

print(df_3.head(2))

import pandas as pd

df_1 = pd.read_csv('https://query.data.world/s/vv3snq28bp0TJq2ggCdxGOghEQKPZo')

df_2 = pd.read_csv('https://query.data.world/s/9wVKjNT0yiRc3YbVJaiI8a6HGl2d74')

df_3 = pd.merge(df_1,df_2,how='inner', on='unique_id')

print(df_3.head(2))

Dataframe Append

Description Append two datasets df_1 and df_2, and print the combined dataframe.

import warnings

warnings.filterwarnings('ignore')

import pandas as pd

df_1 = pd.read_csv('https://query.data.world/s/vv3snq28bp0TJq2ggCdxGOghEQKPZo')

df_2 = pd.read_csv('https://query.data.world/s/9wVKjNT0yiRc3YbVJaiI8a6HGl2d74')

df_3 =df_1.append(df_2)

print(df_3.head())

Operations on multiple dataframes

Description Given three data frames containing the number of gold, silver, and bronze Olympic medals won by some countries, determine the total number of medals won by each country. Note: All the three data frames don’t have all the same countries. So, ensure you use the ‘fill_value’ argument (set it to zero), to avoid getting NaN values. Also, ensure you sort the final dataframe, according to the total medal count in descending order.

import numpy as np

import pandas as pd

# Defining the three dataframes indicating the gold, silver, and bronze medal counts

# of different countries

gold = pd.DataFrame({'Country': ['USA', 'France', 'Russia'],

'Medals': [15, 13, 9]}

)

silver = pd.DataFrame({'Country': ['USA', 'Germany', 'Russia'],

'Medals': [29, 20, 16]}

)

bronze = pd.DataFrame({'Country': ['France', 'USA', 'UK'],

'Medals': [40, 28, 27]}

)

# Set the index of the dataframes to 'Country' so that you can get the countrywise

# medal count

gold.set_index('Country', inplace = True)

silver.set_index('Country', inplace = True)

bronze.set_index('Country', inplace = True)

# Add the three dataframes and set the fill_value argument to zero to avoid getting

# NaN values

total = gold.add(silver, fill_value = 0).add(bronze, fill_value = 0)

# Sort the resultant dataframe in a descending order

total = total.sort_values(by = 'Medals', ascending = False)

# Print the sorted dataframe

print(total)

Dataframe grouping

Description Group the data 'df' by 'month' and 'day' and find the mean value for column 'rain' and 'wind'.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_md=df.groupby(['month','day'])

df_1 = df_md['rain','wind'].mean()

print(df_1.head())

rain wind

month day

apr fri 0.0 3.100000

mon 0.0 3.100000

sat 0.0 4.500000

sun 0.0 5.666667

thu 0.0 5.800000

Creating New Column in a Dataframe

Description Create a new column 'XY' which consist of values obtained from multiplying column 'X' and column 'Y'.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df['XY'] = df['X']*df['Y']

print(df.head())

X Y month day FFMC DMC DC ISI temp RH wind rain area XY

0 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0 35

1 7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0 28

2 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0 28

Dataframe Pivot Table

Description Group the data 'df' by 'month' and 'day' and find the mean value for column 'rain' and 'wind' using the pivot table command.

import numpy as np

import pandas as pd

df = pd.read_csv('https://query.data.world/s/vBDCsoHCytUSLKkLvq851k2b8JOCkF')

df_1=pd.pivot_table(df, values=['rain','wind'], index=['month', 'day'], aggfunc='mean')

print(df_1.head(10))

rain wind

month day

apr fri 0.000000 3.100000

mon 0.000000 3.100000

sat 0.000000 4.500000

sun 0.000000 5.666667

Missing Values

Description Print out the number of missing values in each column in the given dataframe.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')

print(df.isnull().sum())

Missing Values Percentage

Description Find out the percentage of missing values in each column in the given dataset.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')

print(round(100*df.isnull().sum()/len(df.index),2))

Ord_id 0.00

Profit 0.65

Shipping_Cost 0.65

Product_Base_Margin 1.30

dtype: float64

Removing Missing Values From the Rows

Description Remove the missing values from the rows having greater than 5 missing values and then print the percentage of missing values in each column.

import pandas as pd

df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')

df = df[df.isnull().sum(axis=1)<=5]

print(round(100*df.isnull().sum()/len(df.index),2))

import pandas as pd

df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')

df = df[df.isnull().sum(axis=1)<=5]

print(round(100*df.isnull().sum()/len(df.index),2))

Mean Imputation

Description Impute the mean value at all the missing values of the column 'Product_Base_Margin' and then print the percentage of missing values in each column.

import numpy as np

import pandas as pd

df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')

df['Product_Base_Margin']=df['Product_Base_Margin'].mean()

print(round(100*df.isnull().sum()/len(df.index),2))

import numpy as np

import pandas as pd

df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')

df.loc[np.isnan(df['Product_Base_Margin']), ['Product_Base_Margin']] = df['Product_Base_Margin'].mean()

print(round(100*(df.isnull().sum()/len(df.index)), 2))

Jupyter Notebook File

https://colab.research.google.com/drive/14ihb__mj99TlC65usdp9bLQpjlIfSkfX?authuser=1#scrollTo=bE5HHI0OJ48f