Skip to content

Commit 0c0c9e6

Browse files
authored
DTAT: sources.folder (#296)
* Add files via upload * Add files via upload * Update .gitignore
1 parent 46e3f72 commit 0c0c9e6

2 files changed

Lines changed: 267 additions & 1 deletion

File tree

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ deeptrack-app/*
77
lightning_logs
88

99
paper-examples/models/*
10+
tutorials/3-advanced-topics/dummy_directory/
1011

1112
build/*
1213
dist/*
@@ -31,4 +32,4 @@ examples/**/*/models/
3132

3233
*_dataset/
3334

34-
.DS_Store
35+
.DS_Store
Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# deeptrack.sources.folder\n",
8+
"\n",
9+
"<a href=\"https://colab.research.google.com/github/DeepTrackAI/DeepTrack2/blob/develop/tutorials/3-advanced-topics/DTAT391B_sources.folder.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": 1,
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"# !pip install deeptrack # Uncomment if running on Colab/Kaggle."
19+
]
20+
},
21+
{
22+
"cell_type": "markdown",
23+
"metadata": {},
24+
"source": [
25+
"This advanced tutorial introduces the sources.folder module."
26+
]
27+
},
28+
{
29+
"cell_type": "markdown",
30+
"metadata": {},
31+
"source": [
32+
"## 1. What is `folder`?\n",
33+
"\n",
34+
"The `folder` module enables the management of image datasets organized in a directory hierarchy. It contains a single class `ImageFolder` that provides utilities to perform structured naming, organization, and retrieval of image data."
35+
]
36+
},
37+
{
38+
"cell_type": "markdown",
39+
"metadata": {},
40+
"source": [
41+
"## 2. Creating a Directory Structure\n",
42+
"\n",
43+
"Since the `ImageFolder` class expects images to be stored in directories categorized by class names, we will need to create a dummy directory structure for demonstration purposes."
44+
]
45+
},
46+
{
47+
"cell_type": "code",
48+
"execution_count": 29,
49+
"metadata": {},
50+
"outputs": [],
51+
"source": [
52+
"import os\n",
53+
"import shutil\n",
54+
"\n",
55+
"from deeptrack.sources import folder\n",
56+
"\n",
57+
"\n",
58+
"# Define root directory.\n",
59+
"dataset_path = \"dummy_dataset\"\n",
60+
"\n",
61+
"# Define class names.\n",
62+
"classes = [\"cat\", \"dog\", \"bird\"]\n",
63+
"\n",
64+
"# Remove existing directory if exists.\n",
65+
"if os.path.exists(dataset_path):\n",
66+
" shutil.rmtree(dataset_path)\n",
67+
"\n",
68+
"# Create directories.\n",
69+
"for class_name in classes:\n",
70+
" os.makedirs(os.path.join(dataset_path, class_name))\n",
71+
"\n",
72+
"# Create some empty dummy files.\n",
73+
"for class_name in classes:\n",
74+
" for i in range(3): \n",
75+
" with open(os.path.join(dataset_path, class_name, f\"image_{i}.jpg\"), \"w\") as f:\n",
76+
" f.write(\"\")\n"
77+
]
78+
},
79+
{
80+
"cell_type": "markdown",
81+
"metadata": {},
82+
"source": [
83+
"## 3. Initializing an `ImageFolder`.\n",
84+
"Now that the dummy directory is created, we initialize an `ImageFolder` object."
85+
]
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": 30,
90+
"metadata": {},
91+
"outputs": [
92+
{
93+
"name": "stdout",
94+
"output_type": "stream",
95+
"text": [
96+
"Total images in dataset: 9\n",
97+
"Classes: ['cat', 'bird', 'dog']\n"
98+
]
99+
}
100+
],
101+
"source": [
102+
"data_source = folder.ImageFolder(dataset_path)\n",
103+
"\n",
104+
"# Print total number of images.\n",
105+
"print(f\"Total images in dataset: {len(data_source)}\")\n",
106+
"\n",
107+
"# Print class names.\n",
108+
"print(f\"Classes: {data_source.classes}\")"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"metadata": {},
114+
"source": [
115+
"## 4. Getting Category Names from File Paths\n"
116+
]
117+
},
118+
{
119+
"cell_type": "code",
120+
"execution_count": 31,
121+
"metadata": {},
122+
"outputs": [
123+
{
124+
"name": "stdout",
125+
"output_type": "stream",
126+
"text": [
127+
"Category of dummy_dataset/dog/image_1.jpg: dog\n"
128+
]
129+
}
130+
],
131+
"source": [
132+
"example_path = os.path.join(dataset_path, \"dog\", \"image_1.jpg\")\n",
133+
"category = data_source.get_category_name(example_path, directory_level=0)\n",
134+
"print(f\"Category of {example_path}: {category}\")\n"
135+
]
136+
},
137+
{
138+
"cell_type": "markdown",
139+
"metadata": {},
140+
"source": [
141+
"## 5. Dataset Splitting.\n",
142+
"\n",
143+
"If the dataset has subcategories (e.g., train/dog, train/cat), we can split it according to those subcategories."
144+
]
145+
},
146+
{
147+
"cell_type": "code",
148+
"execution_count": null,
149+
"metadata": {},
150+
"outputs": [
151+
{
152+
"name": "stdout",
153+
"output_type": "stream",
154+
"text": [
155+
"Train set classes: ['cat']\n",
156+
"Test set classes: ['dog']\n"
157+
]
158+
}
159+
],
160+
"source": [
161+
"# Create directories if they don't exist.\n",
162+
"train_dir = os.path.join(dataset_path, \"train\")\n",
163+
"test_dir = os.path.join(dataset_path, \"test\")\n",
164+
"os.makedirs(train_dir, exist_ok=True)\n",
165+
"os.makedirs(test_dir, exist_ok=True)\n",
166+
"\n",
167+
"\n",
168+
"# Define source and destination paths\n",
169+
"cat_src = os.path.join(dataset_path, \"cat\")\n",
170+
"cat_dest = os.path.join(train_dir, \"cat\")\n",
171+
"\n",
172+
"dog_src = os.path.join(dataset_path, \"dog\")\n",
173+
"dog_dest = os.path.join(test_dir, \"dog\")\n",
174+
"\n",
175+
"\n",
176+
"# Move only if source exists and destination does not.\n",
177+
"if os.path.exists(cat_src) and not os.path.exists(cat_dest):\n",
178+
" shutil.move(cat_src, train_dir)\n",
179+
"\n",
180+
"if os.path.exists(dog_src) and not os.path.exists(dog_dest):\n",
181+
" shutil.move(dog_src, test_dir)\n",
182+
"\n",
183+
"split_data_source = folder.ImageFolder(dataset_path)\n",
184+
"\n",
185+
"# Split into train and test.\n",
186+
"train, test = split_data_source.split(\"train\", \"test\")\n",
187+
"\n",
188+
"print(f\"Train set classes: {train.classes}\")\n",
189+
"print(f\"Test set classes: {test.classes}\")\n"
190+
]
191+
},
192+
{
193+
"cell_type": "markdown",
194+
"metadata": {},
195+
"source": [
196+
"## 6. Print directory structure\n",
197+
"The resulting directory structure from splitting the dataset can be visualized by running the code cell below."
198+
]
199+
},
200+
{
201+
"cell_type": "code",
202+
"execution_count": null,
203+
"metadata": {},
204+
"outputs": [
205+
{
206+
"name": "stdout",
207+
"output_type": "stream",
208+
"text": [
209+
"📂 dummy_dataset\n",
210+
" 📂 test\n",
211+
" 📂 dog\n",
212+
" 📄 image_0.jpg\n",
213+
" 📄 image_1.jpg\n",
214+
" 📄 image_2.jpg\n",
215+
" 📂 train\n",
216+
" 📂 cat\n",
217+
" 📄 image_0.jpg\n",
218+
" 📄 image_1.jpg\n",
219+
" 📄 image_2.jpg\n",
220+
" 📂 bird\n",
221+
" 📄 image_0.jpg\n",
222+
" 📄 image_1.jpg\n",
223+
" 📄 image_2.jpg\n"
224+
]
225+
}
226+
],
227+
"source": [
228+
"for root, dirs, files in os.walk(dataset_path):\n",
229+
"\n",
230+
" # Get depth of directory for indenting the print text.\n",
231+
" depth = root.replace(dataset_path, \"\").count(os.sep)\n",
232+
" indent = \" \" * depth\n",
233+
"\n",
234+
" # Directories.\n",
235+
" directory_name = os.path.basename(root)\n",
236+
" print(f\"{indent}📂 {directory_name}\")\n",
237+
" \n",
238+
" # Files.\n",
239+
" for filename in sorted(files):\n",
240+
" print(f\"{indent} 📄 {filename}\")"
241+
]
242+
}
243+
],
244+
"metadata": {
245+
"kernelspec": {
246+
"display_name": ".venv",
247+
"language": "python",
248+
"name": "python3"
249+
},
250+
"language_info": {
251+
"codemirror_mode": {
252+
"name": "ipython",
253+
"version": 3
254+
},
255+
"file_extension": ".py",
256+
"mimetype": "text/x-python",
257+
"name": "python",
258+
"nbconvert_exporter": "python",
259+
"pygments_lexer": "ipython3",
260+
"version": "3.10.12"
261+
}
262+
},
263+
"nbformat": 4,
264+
"nbformat_minor": 2
265+
}

0 commit comments

Comments
 (0)