Data Import and Export:
Data Import:
- CSV (Comma Separated Values):
- R: Use
read.csv()
function. - Python (Pandas): Use
pandas.read_csv()
.
- R: Use
- Excel Files:
- R: Use
read_excel()
from packages likereadxl
oropenxlsx
. - Python (Pandas): Use
pandas.read_excel()
.
- R: Use
- Text Files (txt):
- R: Use
readLines()
for plain text orread.table()
for structured text files. - Python: Use
open()
or libraries likepandas
for structured data.
- R: Use
- JSON (JavaScript Object Notation):
- R: Use
fromJSON()
from thejsonlite
package. - Python: Use
json.loads()
or libraries likepandas
orjson
module.
- R: Use
Data Export:
- CSV (Comma Separated Values):
- R: Use
write.csv()
orwrite.csv2()
for international usage. - Python (Pandas): Use
to_csv()
.
- R: Use
- Excel Files:
- R: Use packages like
writexl
oropenxlsx
. - Python (Pandas): Use
to_excel()
.
- R: Use packages like
- Text Files (txt):
- R: Use
writeLines()
for plain text orwrite.table()
for structured text files. - Python: Use
open()
or libraries likepandas
for structured data.
- R: Use
- JSON (JavaScript Object Notation):
- R: Use
toJSON()
from thejsonlite
package. - Python: Use
json.dump()
or libraries likepandas
orjson
module.
- R: Use
Attributes and Data Types:
Attributes:
- Nominal Attribute:
- Categories with no order or ranking (e.g., colors, types).
- Ordinal Attribute:
- Categories with a specific order or ranking (e.g., low, medium, high).
- Interval Attribute:
- Data with a consistent interval between values, but no true zero point (e.g., temperature in Celsius).
- Ratio Attribute:
- Data with a consistent interval between values and a true zero point (e.g., height, weight).
Data Types:
- Numeric (Continuous) Data Types:
- Integer: Whole numbers (e.g., 1, 2, -3).
- Float (or Double): Numbers with decimals (e.g., 1.5, -0.003).
- Categorical (Discrete) Data Types:
- Character/String: Text data (e.g., “hello”, “category A”).
- Factor: Categorical data with predefined levels or categories.
- Boolean Data Type:
- Represents true or false values (e.g., TRUE, FALSE).
- Date and Time Data Types:
- Date: Represents calendar dates (e.g., “2022-09-27”).
- Time: Represents time of day (e.g., “14:30:00”).
- DateTime (or Timestamp): Represents both date and time.
- Complex Data Types:
- Some languages have more complex data types like lists, dictionaries, or data frames.
Understanding these concepts helps in correctly handling and analyzing data, as different types of data may require different processing and visualization techniques. Additionally, it’s essential for data preprocessing and feature engineering when building machine learning models.